Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Handbook For Professional Development In Assessment
Literacy: A Resource For States and School Districts
NCSA Conference
June 20, 2013
Project of the Title I Comprehensive
Assessment Systems CCSSO SCASS
Wayne Neuburger (PhD) Advisor
REVISION OF THE HANDBOOK
FOR PROFESSIONAL
DEVELOPMENT IN
ASSESSMENT LITERACY
T1-CAS Mission
•The Title I Comprehensive Assessment Systems SCASS supports states in efforts to use assessment and accountability systems to support the improvement of education in schools using ESEA funds.
•As a national consortium of assessment and Title I professionals, T1-CAS addresses issues in standards, assessment, and accountability systems, and the effects of these systems on the education of Title I students.
Presenters
• Moderator: Wayne Neuburger, Consultant
• Project Director: Jan Sheinker, Consultant
• Author: Doris Redfield, Consultant
• Reviewer: Beth Cipoletti, West Virginia
Dept of Ed
• Discussant: Elizabeth Davis, Alaska Dept of
Ed
Jan Sheinker, Ed.D. Doris Redfield, Ph.D.
Developed Under the Direction of: Phoebe C. Winter Project Director
And
The Professional Development for Assessment Literacy Study Group Comprehensive Assessment Systems for IASA Title I State Collaborative on Assessment and Student Standards Council of Chief State School Officers
Purpose 2001 and 2013
• It is intended to provide a resource to states and districts as they deploy state and district assessment systems aligned with standards for purposes of improving student learning through accountability and school improvement .
• We hope that this document also serves as a resource for informing many constituents of education about the purposes of assessment and the importance of an aligned assessment system to the overall educational system .
SCASS GROUPS PROVIDING
INPUT
• Formative Assessment for Students and
Teachers (FAST)
• Assessing Special Education Students
(ASES)
• English Language Learners (ELL)
• Accountability Systems and Reporting
(ASR)
• Technical Issues in Large-Scale Assessment
(TILSA)
Need for Revision
• Significant changes have occurred in
assessment systems since 2001.
• Accountability systems have changed since
2001.
• Assessment technical issues have
progressed to address new assessment and
accountability systems.
• Distribution systems are more sophisticated
Organization – Actually FOUR
documents Documents Users Uses
States, districts, schools, and
other interested audiences
To preserve the original document
Word
And
Ppt
(Static &
Animated)
State and district PD
presenters
For customization by individual
states, districts, & schools as PD
scripts and handouts for PD
presenters
Individual states, districts, &
schools
For customization by individual
states, districts, & schools as district
and school newsletter inserts to
inform parents and community
Table of Contents
Guide to Using the Handbook
Chapter One: Why Build an Assessment System?
Chapter Two: What Is Technical Quality?
Chapter Three: How Are the Purposes of Assessment Related to Technical Quality?
Chapter Four: How Are the Uses of Assessment Related to Technical Quality?
Chapter Five: How Do Schools/States Report Results in Proper Context?
Chapter Six: How Should Results Be Used to Make Decisions?
Appendices
Glossary
References
Guide to Using the Handbook
Using the Handbook
Audiences for the Handbook
State Department Personnel
Schoolwide Planning Participants
Local School Boards and Administrators
Legislative Committees
Regional Professional Development Participants
Members of Professional Associations
Pre-Service and Graduate Students in Education
Community Members
Customizing
CONCLUSION
Chapter One: Why Build an Assessment
System?
• Standards Aligned Assessment Systems – How have aligned assessment systems evolved?
– How have Common Core State Standards influenced the development of assessment systems?
• Relationships of Tests to Assessment Systems to Accountability Systems
– What are comprehensive assessment systems?
– What are the purposes and relationships among formative assessment strategies and classroom, school, district, and state tests?
– How do assessment systems relate to accountability systems?
– Why use an assessment system instead of individual tests to set goals and make decisions?
• Developing a Comprehensive System – What is the role of formative assessment strategies in a comprehensive system?
– What is the role of classroom tests in a comprehensive system?
– What is the role of interim and benchmark tests in a comprehensive system?
– What is the role of district tests in a comprehensive system?
– What is the role of English language proficiency tests in a comprehensive system?
– What is the role of state tests in a comprehensive system?
• Assessment Systems and the Classroom – How do assessment and accountability systems relate to the way we do business in classrooms?
– Who does what in standards-based schools?
Chapter Two: What is Technical
Quality? • Aspects of Technical Quality
• Validity – Why is validity important?
– How can validity be increased?
• Reliability
• Fairness, Bias and Accessibility
• Comparability
• Procedures for test administration, scoring, data analysis, and reporting
• Evaluation of the technical quality of accommodations
Chapter Three: How are the Purposes of
Assessment Related to Technical Quality?
• Alignment of Assessments with Standards – What is alignment with the purpose of the assessment?
– Why are specific characteristics important to alignment?
– Why are vertical and horizontal alignment important?
• Accountability Purposes – Why is accountability needed?
– Who is accountable for student learning?
– For what should schools be held accountable?
– What is accountability for growth?
• Assessments Purposes – Why do assessment systems include different types or combinations of tests?
– What are norm-referenced tests?
– What are standards-based tests?
– What are augmented assessment systems?
– What are computer adaptive tests?
– What are APIP enabled technology enhanced assessment systems?
Chapter Four: How are the Uses of
Assessment Results Related to Technical
Quality?
• Using Results Appropriately to Avoid Misuses – What are appropriate uses and potential misuses of standards-based tests?
– What are appropriate uses and potential misuses of norm-referenced tests?
– What are appropriate uses and potential misuses of interim/benchmark tests?
– What are appropriate uses and potential misuses of classroom tests?
– What are appropriate uses and potential misuses of formative assessment strategies?
• Usability of Results – What should be considered in determining the usability of results?
– What factors affect the credibility of results?
– What factors affect the accuracy of score interpretation?
– What results are needed to adjust instruction?
– What factors affect the usefulness of results for teacher and leader evaluation?
Chapter Five: How Do Schools/States
Report Results in Proper Context?
Reporting Results
- How are indicators selected for reporting?
- What indicators provide direct measures of student achievement of
standards?
- What other indicators provide direct measures of student knowledge?
- What student learning indicators provide indirect measures of student
performance?
Reporting Related Indicators
- What indicators provide measures of opportunity to learn?
- What context variables affect student learning?
Cautions for Reporting Results
Sampling and Sample Size
- How does sampling affect the technical quality of a tests?
- How does population and sample size affect the technical quality of a test?
Chapter Six: How Should Results Be
Used to Make Decisions?
• Using Results to Make Decisions
• Using Results to Make Decisions About School Improvement – How can the results be used to profile student performance?
– How is the profile used to set school improvement goals?
– What is considered in developing a plan to achieve the goals?
– How are school improvement results monitored and documented?
• Using Results to Make Decisions about Policy Changes and Evaluating program effectiveness – How can the system be monitored and evaluated?
– How can results be used to make policy decisions concerning resources?
– How can the delivery system be altered to improve results?
• Using Results to Make Decisions About Rewards and Sanctions
Technical Quality
(Chapter 2: Handbook for
Professional Development in
Assessment Literacy)
Doris Redfield, Ph.D.
304-344-3083
Contents • Aspects of Technical Quality
• Validity
– Importance
– Ways to increase
• Reliability
• Fairness, Bias, & Accessibility
• Comparability
• Procedures: Test Administration, Scoring, Data
Analysis, Reporting
• Evaluation of Accommodations Use
What is Technical Quality (TQ)
• The integrity of each assessment,
instrument, process, & procedure for
contributing to fair and defensible
decisions.
Aspects of TQ
• Validity (accuracy – test measures what it purports
to measure)
• Reliability (consistency – whatever is measured is
consistently measured across circumstances)
• Other
– Fairness & accessibility
– Comparability
– Procedures for test administration, scoring, data
analysis & reporting
– Interpretation & use of results
TECHNICAL QUALITY IN
ASSESSMENT SYSTEMS
Rigorous standards and standards-aligned tests
Valid, reliable, comparable, fair, and accessible
tests that include ALL students
Technically sound administration and scoring
Accurate reporting and interpretation of results
Additions and/or Increased
Emphasis
• Methodologies
– Comparability
– Generalizability
– Standard Setting
• Attention to
– Peer Review Guidance
– Joint Standards for Educational & Psychological
Testing
– Evaluation of the TQ of Accommodations
VALIDITY
• Are we measuring
what we say we are
measuring?
• Can we make valid
interpretations/
inferences?
Validity
• Four broad categories (Standards for
Educational & Psychological Testing)
1. Test content (content validity)
2. Relationship to other variables or criteria
3. Student response processes
4. Internal structure of the assessment
Validity: Peer Review Guidance Questions
(Section 4.2)
Has the State
• Specified the purposes of the assessments . . . ?
• Ascertained that the assessments measure the knowledge and skills described
in its academic content standards . . . ?
• Ascertained that its assessment items are tapping the intended cognitive
processes & are at the appropriate grade level?
• Ascertained that scoring and reporting structures are consistent with the sub-
domain structures of its academic content standards?
• Ascertained that test & item scores are related to outside variables as
intended?
• Ascertained that the decisions based on assessment results are consistent with
the purposes for which the assessments were designed?
• Ascertained whether the assessment produces intended and unintended
consequences?
KEY: Provide clear & explicit documentation of evidence
INCREASING VALIDITY
• Multiple measures
• Face validity
• Content validity: Breadth; depth; cognitive complexity; range
of knowledge & skills, including thinking skills, represented by
the content standards
• Construct validity
– Evidence of convergent criterion validity or the relationship of
particular items to the other items on the same test/assessment
(internal consistency reliability
– Evidence of divergent criterion validity
– Expert review
Reliability
• Test reliability: reliability coefficients;
standard errors of measurement (SEMs);
confidence bands
• Rater reliability: inter- & intra-rater
• Standard setting and reliability of cut scores
– At least 3 levels required
– Descriptors for each level required
– Methods: Angoff (& modifications);
Bookmark; Body of Work
STANDARD SETTING
METHODS
METHOD METHODOLOGY
Angoff
Panel of content experts
Sum of item averages
Raw cut score
Bookmark
Statistical item difficulty
Ordered item booklets reviewed by Panel of raters
Panel “Bookmarks” cut points
IRT based cut scores
Body of Work
Student response booklets reviewed by Panel of
content experts
Identify knowledge and skills
Assign achievement level
Final cut score recommendations
Reliability
• Test reliability: reliability coefficients;
standard errors of measurement (SEMs);
confidence bands
• Rater reliability: inter- & intra-rater
• Standard setting and reliability of cut scores
– At least 3 levels required
– Descriptors for each level required
– Methods: Angoff (& modifications);
Bookmark; Body of Work
Reliability Cont’d . . .
• Generalizability: extent to which reliability findings can
be replicated across situations
• Purpose specific
• Peer Review Guidance Questions (Section 4.2) – Has the
State
‾ Determined the reliability of the scores it reports . . . ?
‾ Quantified & documented the conditional SEM and student
classification consistent at each cut score in the State’s academic
achievement standards?
‾ Reported generalizability evidence for all relevant sources, . . . ?
KEY: Provide clear & explicit documentation of
evidence
Fairness, Bias, & Accessibility
Peer Review Guidance Question (Section 4.3) – Has
the State
• Ensured that the assessments provide an appropriate
variety of accommodations . . . ?
• Ensured that the assessments provide an appropriate
variety of linguistic accommodations . . . ?
• Taken steps to ensure fairness in the development of the
assessments?
• Used accommodations &/or alternate assessments to yield
meaningful scores?
Most Likely Sources of Unfairness (Standards for Educational & Psychological Tests)
• Items or tasks do not provide an equal
opportunity for all students to demonstrate their
knowledge & skills fully.
• The assessments are not administered in ways
that ensure fairness.
• The results are not reported in ways that ensure
fairness.
• The results are not interpreted or used in ways
that lead to equitable treatment of those affected
by the results.
Key Sources of Challenge to
Fairness, Bias, & Accessibility
• Language loading
• Background knowledge requirements
• Cultural bias
• Other factors specific to certain students
FAIRNESS AND BIAS
FAIRNESS
of
assessment
BIAS of
items/results
for purpose
for group
for use
for male/female
for English language
learners
for racial or ethnic
minorities
for students with disabilities
Fairness and Access
• Access through
accommodations
– For students with
disabilities
– For English language
learners
– That yield meaningful
scores
• Fairness in
– Item/task design
– Test administration
– Reporting results
– Interpretation of results
Fairness, Access & Bias
in Items & Tasks language loading: use of vocabulary not required by the content that impedes
access to assessment items and tasks for students who lack proficiency in
language, either due to their having another primary language, a learning
disability, or delayed language development.
background knowledge requirements: design or content of assessment items or
tasks that impedes access based on assumptions about what knowledge students
bring to the assessment situation apart from instruction.
other: incorporation of other factors in assessment item and task design that
increase the challenge of performing correctly in ways unrelated to the
content being assessed.
cultural bias: inclusion of situations and contexts in assessment items and tasks that
may be misinterpreted by students from different cultural backgrounds in ways that
interfere with performance.
Comparability
• An aspect of reliability that allows for comparisons from year to year, student to student, school to school, form to form, etc.
• Methodologies – Linking: The relationship between test scores on two
different tests that are not necessarily built to have the same content or level of difficulty
– Equating: A type of linking that provides the strongest possible linking relationship, rendering test scores across different forms of the same test interchangeable.
– Scaling: A process for transforming raw scores to scores on another scale (e.g., SAT scores). Item Response Theory (IRT) is one example of a type of scaling.
Redfield
Comparability: Peer Review Guidance
Questions (Section 4.4)
• Has the State taken steps to ensure
consistence of test forms over time?
• If the state administers both an online and
paper-and-pencil test, has the State
documented the comparability of the
electronic and paper forms?
Procedures: Test Administration,
Scoring, Data Analysis, & Reporting
• Peer Review Guidance Questions (Section 4.5)
– Has the State established clear criteria for the
administration, scoring, analysis, and reporting
components of the assessment system,
including all alternate assessments?
– Does the State have a system for monitoring
and improving the on-going quality of its
assessment system?
CONSISTENT TEST PROCEDURES
administer
score
analyze
Uniform presentation
Structured accommodations
Consistently applied
scoring rules
Uniform data analysis
Consistently applied
reporting rules and templates report
Procedures specified in policies and
guidelines
Implementation monitored and findings
documented
Security prescribed and enforced
Improvements planned and implemented
Administration, Scoring, Analysis &
Reporting
Evaluating the TQ of Accommodations
• States must provide for the use of
appropriate accommodations AND must
have conducted studies confirming that
scores from accommodated assessments can
be meaningfully combines with the scores
from non-accommodated administrations of
the assessment.
TEST ACCOMMODATIONS
Are
Changes in setting, schedule, timing,
presentation, response, etc.
Intended to equalize opportunity to perform
Consistent with instructional experiences
Consistently and securely administered
Valid and reliable
Peer Review Guidance Questions
(Section 4.6)
How has the State • Ensured that appropriate accommodations are available to students
with disabilities & students covered by Section 504 AND that these
accommodations are used in a manner consistent with the student’s
IEP or 504 plan?
• Determined that scores for students with disabilities based on
accommodated test administrations will allow for valid inferences
about the student’s knowledge & skills AND can be combined
meaningfully with scores from non-accommodated administrations?
• Ensured that appropriate accommodations are available to limited
English proficient (LEP) students . . . ?
• Determined that scores for LEP students based on accommodated
administrations will allow for valid inferences . . . ?n
Using the Handbook to Guide
your Assessment
Beth Cipoletti, Ed.D.
Assistant Director
Office of Assessment and Accountability
Anticipated
• Easy trip
– Have directions
– Have GPS
– Experience with route
– Know best places to stop
– Be in front of the storm
• Reading, PA by 6:30 p.m.
Reality
• Office meeting ran longer than expected
– Departure from Charleston later than planned
• Storm started earlier than forecast
– Snow in the mountains
– Rain to the east
• Limited visibility
• Poor road conditions
Changing Landscape
• New world of assessment and
accountability systems
• Growth to standards adds a new dimension
• Valid systems are more complex
• Different tools to measure school and
teacher effectiveness
Anticipated Work
• Stakeholder information
• Systems monitoring
• Validation of technical adequacy of
assessments
• Peer Review
• Changes in federal requirements
Reality
• Work does not start on time
• Changes happen faster than anticipated
• Inaccurate vision of future
• More time is needed to finish than expected
Handbook (GPS)
• Resource on assessment and accountability
• Straightforward and plainspoken language
• Overview of relevant topics
• References and links to documents for more
in-depth study
Using the Handbook
• Determine course of action
• Respond to specific questions
• Include sections and parts in responses to
inquiries or newsletters
– May qualify, modify or make state specific
• Incorporate PowerPoint slides into
presentations
Next Steps
• Finalize Revisions with T1-CAS
• CCSSO Review and Publication
• Handbook made available to T1-CAS
members
• CCSSO makes Handbook an Official
Publication for distribution
• Possible Webinar to acquaint states with
Handbook
For information on joining T1-CAS, please
contact
• Joe Crawford
Program Associate, CCSSO
202-312-6436 (Office)
330-687-1185 (Mobile)
65
For information on activities of
T1-CAS,
please contact
• Wayne Neuburger
T1 – CAS Program Director
503-390-8045
503-580-5779 (Mobile)
66