Can I Do This In My Pajamas?
Validation Studies Going Virtual
Hillary Michaels
William Lorie
Michaela Gelin
Susan Davis-Becker
Chad Buckendahl
2011 National Conference on Student Assessment
June 19-22, 2011 Orlando, Florida www.ccsso.org
Next Generation Learners: Who are they and what are their needs
Setting Standards Remotely:
Conditions for Success
William Lorié, Ph.D.
Metrica Research Associates, LLC
Five Questions
• Where We Are Now ▫ Why conduct a standard setting? ▫ Why do people meet in person for standard
settings?
• Going Virtual ▫ Is remote suitable for standard setting? ▫ When are panelists ready to make standard setting
judgments? ▫ When should test sponsors allow standards to be
set remotely?
The Logic of Standard Setting
1 No technical analysis of the test will yield a readily recognizable or acceptable cut score
2 Each stakeholder representative (panelist) has or can form an opinion regarding minimally acceptable performance
3 That opinion can be shared, discussed, and modified
4 A suitably designed process will transform all panelist opinions to a defensible cut score on the test
Standard Setting Essentials
Panelists understand the rules of the process
Panelist opinions are formed and shared
Panelists record their opinions according to the rules of the process
Why meet face-to-face?
• Reasons unrelated to standard setting goals
▫ Professional development
▫ Networking with colleagues
• Goal-related
▫ Taking the test
▫ Training
▫ Reviewing (secure) materials
▫ Engaging in large and/or small group discussions
Three Types of Goal-Related Standard-
Setting Tasks • One-to-many
▫ Orientation, Training, Providing feedback
• Individual ▫ Taking the test, Reviewing materials, Making
judgments, Making ratings, Completing in-process and final evaluations
• Many-to-many ▫ Writing or revising descriptors, Discussing the
borderline candidate, Discussing ratings, Discussing feedback
Intra-Panel Elements in Context
Train Review Discuss Debrief /
Train
Assess Judge Inform Discuss
Train Judge Inform Assess
Debrief
Five Questions, Revisited
• Where We Are Now ▫ Why conduct a standard setting? ▫ Why do people meet in person for standard
settings?
• Going Virtual ▫ Is remote suitable for standard setting? ▫ When are panelists ready to make standard setting
judgments? ▫ When should test sponsors allow standards to be
set remotely?
The Most Serious Objections
• Test and data security at risk
• Discussions not “rich enough” if not in person
• Cannot discern if participants are on task
The Most Serious Objections
• Test and data security at risk ▫ Test Development < Remote Stan Set < Testing ▫ Relationship between panelists and sponsoring
agency is key
• Discussions not “rich enough” if not in person ▫ Descriptor writing ≠ standard setting ▫ Remote discussions may be more focused
• Cannot discern if participants are on task ▫ Problem is mode-independent ▫ Reduction of session lengths helpful
Managing Remote Many-Many Processes
• Reliable meeting system
▫ E.g., Conference or video call among several people, with a facilitator or group leader
▫ Low chance of interruptions
• Rules for turn-taking and strict schedules
• (Possibly) System for drawing attention to materials
When are panelists ready to make
standard setting judgments? • Understand the objective referent on which to
base their judgments
• Understand the procedure for making judgments and have practiced it
• Understand the legitimate reasons for modifying their judgments and the likely effect of specific modifications
Is remote standard setting right for
your program?
Yes No
Does my program plan to release a practice or
sample test form? ✔
Can I entrust my panelists with secure
materials? ✔
Can I proceed to standard setting voting with
very little or no descriptor refinement? ✔
Can my panelists work with remote meeting
technologies? ✔
Does my organization have the necessary
technologies and technical support? ✔
Virtual and Face-to-face Standard
Setting: A Blended Model
Michaela Gelin, Ph.D.
CGA Canada
Hillary Michaels, Ph.D.
Consultant
Overview
• Why use a blended model for standard setting?
• Overview of examination
• Modified Angoff standard setting
▫ Virtual training
▫ Virtual Round 1 ratings
▫ Face-to-face discussion & Round 2 ratings
• Feasibility for other standard settings
• Security
• Future directions
Why Use A Blended Model?
• Economic and logistical benefits
• Convenient for panelists; panelists complete the work anywhere, anytime
▫ Advantage: panelists more likely to partake
• Digital media has made it possible
▫ Easily collaborate across distances
▫ Eliminates the need to email files; improves security for sensitive documents
Examination & Standard Setting
• Certification exam (4 hours)
▫ 20 MCQs
▫ 3 CRQs
• Exam assesses technical competencies & professional qualities and skills (e.g., Problem Solving, Communication, Integrative Approach)
• Modified Angoff standard setting process is used to determine a single Pass/Fail cut score for the exam
Virtual Training
• Web-based meeting software • Share and view documents (PowerPoint, Excel
Spreadsheets) with panelists in real time ▫ Panelists connect on their computers through a
web browser ▫ Everyone sees the same thing (e.g., PowerPoint
presentation) at the same time
• Phone conferencing ▫ The facilitator talks while sharing documents ▫ Q&A encouraged during the presentation
Training cont…
• Training includes
▫ The purpose and process of standard setting
▫ Discussion and consensus about minimally-competent performance (i.e., borderline candidates)
▫ Factors for making ratings
▫ Practice ratings
• One-on-one follow-up training is provided for panelists who need additional support
Round 1 Virtual Ratings
• Exam and keys couriered and signature required
• Panelists take the exam at home under exam-like conditions and self assess their performance
• Panelists use secure portal site to complete Round 1 ratings using Excel
• Panelists given ample time (1 week) to complete Round 1 ratings
• Round 1 ratings compiled for sharing at the face-to-face meeting
Round 1 MCQ Excel Ratings
Enter your Inititals below:
Question # Competency Area Competency Enter percentage from 0 - 100.
1 MA MA:03
2 IT IT:01
3 AS AS:03
4 BE BE:06
5 ET ET:03
What percentage of minimally-
competent candidates will
answer this question correctly?
Standard Setting Round 1 Rater Data Sheet
Round 1 CRQ Excel Ratings
You need to provide percentage estimates that total to 100%.
Question # Competency Area Competency
0 1 2 3 4
Sum Mean
20 MA MA:01
0 0
21 MA MA:02
0 0
22 FAR FAR:08
0 0
23 FAR FAR:02
0 0
Question # Competency Area Competency
28 Communication CM:01
29 Integrative
Approach
IA:02
30 Problem Solving PS:01
31 Leadership LD:01
What percentage of minimally-competent candidates will demonstrate
competent performance on this professional quality and skill?
Enter percentage from
0 - 100.
Core and Core-Related competencies
Professional Qualities and Skills
What percentage of minimally-competent candidates will demonstrate
the competency at each of the levels indicated?
Standard Setting Round 1 Rater Data Sheet
Round 2 Face-to-face Ratings
• Review standard setting process, assessment program, performance level descriptors
• Discuss exam-taking experience and factors that impact performance
• Repeated process: For each rating, review Round 1, discuss similarities and differences, review candidate data, and make final rating individually
Round 2 Ratings: Google docs
• Google docs provides for real-time data collection, collaboration, and immediate feedback (e.g., average panel rating)
• Ratings are made one-at-a-time, with no reference to individual questions, content, or identifying information
• Facilitator transfers content (ratings) of Google docs to secure hard drive
Conditions for Success
• Virtual web-based training for the group, including practice ratings
• Helpline for additional support is required
• Thoughtful Round 1 ratings made individually
• Round 2 ratings scheduled shortly after completion of Round 1
• Quality control features built into Excel templates for ratings
• Final ratings and discussion made in person
Security
• Pending signed confidentiality, exam and keys couriered with signature required
• Standard setting begins the day after the exam window ends
• All hard copies collected at face-to-face meeting • Use of secure password protected portal for
round 1 collection • Spreadsheet for capturing Round 2 ratings has
no identifiable information in terms of content (e.g., exam, question)
Applicability to other standard settings
• Especially useful if exam is lengthy and requires thoughtful analysis
• Minimizes the length of face-to-face meetings, thereby saving time and money
• Provides highly flexible schedule for completion of Round 1 ratings (submitted via portal)
• Can conduct multiple training sessions if needed with larger panels
• In this case Excel spreadsheets used for dichotomous and partial credit items
Future Directions
• Enhance test security – exam taken via computer rather than sending hardcopies out
• Training session – save and store on the web for panelist reference
• Improve rating spreadsheet to allow panelists to record their rationales for their ratings and aid recall for Round 2 ratings
Virtual Item Writing and Review Susan Davis-Becker, Ph.D.
Alpine Testing Solutions
Traditional Item Writing and Review
• Multiple in-person meetings
• Item Writing
▫ Substantial training
▫ Practice exercises
▫ Focused item writing over days
▫ Ongoing mentoring
• Item Review
▫ Peer feedback
▫ Group discussion
Virtual Item Writing Training
• Advance materials
• Deliver training through web-based software
• Focus on core elements
• Link training to materials
• Include example items that can be reviewed by the group
• Demonstrate item writing tool
Virtual Item Writing Process
• Initial assignment
▫ Write 2-3 items to an assigned objective
▫ Facilitator reviews and provides targeted feedback
▫ Item writer completes edits
• Ongoing item writing
▫ Larger item writing assignments
▫ Facilitator monitors progress and provides feedback
Virtual Item Content Review
• Web-based software
• Facilitator role
▫ display item content
▫ ask specific review questions
▫ makes changes to items in real time so group can approve final item
Virtual Post-Pilot Item Review
• Web-based software
• Display item content and results of pilot analysis
• Facilitator role
▫ display item content
▫ aid in interpretation of analysis results
▫ facilitate discussion on appropriateness of item.
If not appropriate, guide SMEs in revising item based on analysis results and make changes in real time so item can be re-piloted
Virtual Post-Pilot Item Review: Example
Objective: Solve word problems involving subtraction of small whole numbers.
Two sparrows and three chipmunks are resting in a sequoia. One chipmunk runs down the sequoia and hides in a shrub.
How many mammals are left in the tree?
a) 1
b) 2
c) 4
d) 5
P-value = .15 Item-score correlation = -.05 Option analysis:
Response P-value 0-25% 25-49% 50-74% 75-100%
A .20 .02 .10 .03 .05
B* .15 .07 .06 .03 0
C .50 .03 .07 .15 .25
D .15 .01 .07 .04 .03
Virtual Process - Advantages
• Recruitment ▫ No travel ▫ Item writing: smaller fixed time commitment, most
work can be done on own schedule ▫ Item review: work is conducted over several shorter
meetings, some experts may be able to contribute to some but not all meetings
• Less pressure to create all the items in a fixed time frame, reduce fatigue
• Allows time for iterative feedback to hone item writing skills during the process
Virtual Process - Disadvantages
• Greater security concerns
• Potential for less focus on the process
• Potential for less collaboration among group
• Longer-term commitment
Applicability to other item writing
settings • Technology is available to support virtual item
writing and review
• Subject Matter Experts who are
▫ Physically spread out
▫ Comfortable working independently
▫ Comfortable working with technology
• Test development plan allows for longer item writing process
Future Directions
• Improve software for virtual item writing
• Research comparing quality of items by development mode
Additional Blended Development
and Validation Activities Chad W. Buckendahl, Ph.D.
Alpine Testing Solutions
© 2010 Alpine Testing Solutions, Inc.
Overview
• “Can” versus “Should”
• Other validation activities can include blended methods of evidence collection
▫ Content specification (e.g., practice analysis)
▫ Alignment studies (e.g., assessments to content standards)
▫ Evaluating consequences (e.g., curricular impact, teacher motivation, student achievement)
Content Specification
• Defining content domain, cognitive processes, and performance expectations
• Stakeholder group involvement
▫ Focus groups
▫ Working committees
▫ Surveys of practitioners
Alignment studies
• Evaluating representation of content, cognitive processes, and performance expectations of assessments relative to content expectations
• Independent reviews by subject matter experts
▫ Training activities
▫ Independent, group discussion
▫ Exploratory, confirmatory
Evaluating consequences
• Evidence of intended and unintended consequences (e.g., curricular and instructional change, public perceptions)
• Methods for collecting evidence
▫ Artifact gathering
▫ Focus groups
▫ Working committees
▫ Surveys of stakeholder groups
Virtual Elements - Advantages
• Stakeholder participation
▫ Broader representation of intended population
▫ Time on task
▫ Reduced travel
• Cost containment, with caveats
• Greater flexibility
• Real-time data collection (e.g., survey questionnaires)
Virtual Elements - Disadvantages
• Equivalent interaction with the technologies
• Engagement of participants
• Redistributed costs (e.g., travel versus technology, time commitments)
• Security risks for sensitive material/content
• Quality of information for intended purpose
Summary
• Validation activities do not need to be Either/Or
▫ Some are more conducive to virtual work than others
• Prioritizing in-person versus virtual activities
• Considerations:
▫ Quality of information
▫ Security risk
▫ Resources, time, cost
▫ Political
Then… and Now