26
  2015 ISTEP+ Review Findings and Recommendations on Testing Time 2/25/2015 Edward Roeber William Auty

2015 Istep Review

Embed Size (px)

DESCRIPTION

Governor Mike Pence released the ISTEP report compiled by consultants Edward Roeber and Bill Auty on Monday.

Citation preview

  • 2015 ISTEP+ Review

    Findings and Recommendations on Testing Time

    2/25/2015

    Edward Roeber

    William Auty

  • 1 | P a g e

    EXECUTIVE SUMMARY

    The Indiana Department of Administration, on behalf of the Governor of Indiana, contracted with us to

    investigate the issue of testing time for the 2015 ISTEP+ assessments in Indiana. Although limited time was

    available for this review, we were able to make several short-term recommendations regarding how testing

    time for the 2015 ISTEP+ could be reduced. The purpose of this review was not to determine the causes for

    the proposed testing time, nor to affix blame. Time was too short and we were not sufficiently versed in the

    history of the events to engage in such discussions. This review of short-term issues was conducted in two

    days so that our recommendations would be timely given that testing was about to begin and any changes

    would have to be communicated to school corporations quickly.

    RECOMMENDATIONS FOR IMMEDIATE IMPLEMENTATION

    Recommendation 1: The Department should not release the open-ended (OE) items used in the 2015

    ISTEP+ and the 2016 ISTEP+ programs. Instead, we recommend the releases of example items that are highly

    similar to the OE items. We recommend the OE item release policy be restored once the state has a sufficient

    pool of items for use in the assessment in the future. It is our hope that this will be for the 2016 ISTEP+

    program, but that decision should await analysis of the results of the 2015 administration to determine if the

    item pool is large enough to build assessments for 2016, 2017 and beyond.

    Recommendation 2: IDOE should administer some parts of the 2015 ISTEP+ to only a sample of students

    being tested this year.

    Recommendation 3: The Social Studies portion of the test should be suspended for one year.

    Recommendation 4: IDOE should identify now which 2015 ISTEP+ mathematics and ELA test items best

    align to the Indiana standards.

    Recommendation 5: IDOE should identify the assessment design and anticipated testing time for the 2016

    ISTEP+ program and release this information publicly this spring to demonstrate that the testing time issues

    this year are a one-time event.

    Recommendation 6: We recommend that vertical scaling items be removed from the 2015 online

    assessments.

    In addition to our recommendations for immediate implementation, we also made longer-term

    recommendations regarding the operation of the ISTEP+ program by the Indiana Department of Education

    with support and oversight from the Indiana State Board of Education. The focus of these recommendations is

    to improve the ISTEP+ program in 2016 and beyond. These recommendations are based on best practices in

  • 2 | P a g e

    measurement and large-scale assessment. For each recommendation, we also provided the rationale and a

    summary of the additional work needed to implement these recommendations.

    We made recommendations in five areas:

    Determining Test Length for 2016 and Beyond

    Technical Assistance (Both TAC and Operational Support)

    Test Blueprint as a Planning and Communication Document

    Transition Planning

    Improving Agency Communication

    RECOMMENDATIONS FOR LONG-TERM ISTEP+ QUALITY AND EFFICIENCY

    Recommendation 7: Based on the results of 2015 tests, IDOE should investigate the feasibility of shortening

    the ISTEP+ tests in 2016 and beyond.

    Recommendation 8: We recommend that Indiana establish a technical advisory committee that includes

    individuals who have specific expertise to provide technical advice to the SBOE and IDOE. We also

    recommend that IDOE establish a standing Indiana assessment advisory committee.

    Recommendation 9: We recommend that IDOE develop test specifications and blueprint documents for the

    2015 and 2016 versions of ISTEP+ as soon as possible.

    Recommendation 10: We recommend that Indiana (the SBOE and IDOE) seek external assistance to guide

    the transition of the ISTEP+ and other assessment components, should the state select new vendors for any of

    its assessment components.

    Recommendation 11: We recommend that the SBOE and IDOE review inter-agency communication, both at

    the state level and with local school corporations and that both agencies commit to improvement being made

    to ensure the best possible assessment system for students, educators, parents and citizens of Indiana.

    SUMMARY

    We commend the state for tackling this thorny issue and working together to resolve it. We believe that if

    these recommendations are followed, testing time can be reduced to more manageable levels. We also believe

    that implementing our long-term recommendations will improve the design and implementation of the

    ISTEP+ program in the future. We remain willing to assist in and perhaps monitor efforts to implement these

    recommendations.

    Edward Roeber

    William Auty

  • 3 | P a g e

    The Indiana Department of Administration employed us to conduct a review of the ISTEP+ program,

    particularly the testing time issue. Our report is divided into four sections:

    Recommendations for Immediate Implementation

    Recommendations for Long-Term ISTEP+ Quality and Efficiency

    Additional Observations

    Appendices

    RECOMMENDATIONS FOR IMMEDIATE IMPLEMENTATION

    The Indiana Department of Administration, on behalf of the Governor of Indiana, contracted with us to

    investigate the issue of testing time for the 2015 ISTEP+ assessments in Indiana. Although limited time was

    available for this review, we were able to make several short-term recommendations regarding how testing

    time for the 2015 ISTEP+ could be reduced. The purpose of this review was not to determine the causes for

    the proposed testing time, nor to affix blame. Time was too short and we were not sufficiently versed in the

    history of the events to engage in such discussions. This review of short-term issues was conducted in two

    days so that our recommendations would be timely given that testing was about to begin and any changes

    would have to be communicated to school corporations quickly.

    The Indiana Department of Education (IDOE) and its contractor, CTB-McGraw-Hill (CTB), have been

    forthcoming and helpful in this review process. This has included providing considerable information and

    virtual and in-person meetings.

    We based our review on the four principles listed below. We came to several conclusions and made several

    recommendations and identified additional work will be needed to implement these recommendations. This

    section provides a summary of the short-term review.

    REVIEW PRINCIPLES

    Four principles guided our review of the testing time issue in Indiana. These are:

    1. The results of 2015 tests should be sufficiently reliable and valid to enable the intended purposes of

    the assessment program to be achieved. The assessment design and implementation should also

    meet state and Federal standards for assessment and accountability. Professional judgment is

    required to determine whether a test is sufficiently reliable or valid.

    2. Changes made in the 2015 program should not unduly impact the 2016 program, since it is essential

    that this years issues not continue next year and beyond.

    3. Our recommendations should not be overly-prescriptive. IDOE has the responsibility for and

    understanding of the details of ISTEP+ design and implementation. What we are proposing are

  • 4 | P a g e

    parameters for how the testing time could be reduced. We expect that IDOE and its contractors will

    use these guidelines to effect the suggested changes.

    4. We are willing to continue to assist the Department as it implements these recommendations.

    FINDINGS

    Although our time to reach conclusions and make recommendations was short, we were able to determine

    several things. These are:

    Testing times in excess of 12 hours were scheduled for the mathematics, English language arts,

    science and social studies tests in ISTEP+.

    We believe that it is unnecessary to require young childrenindeed, any studentsto take an

    assessment of 12 hours in length.

    The mathematics test contributes about 4 hours of this time and the ELA test contributes over 8

    hours of testing time. Thus, we found the ELA tests is the real issue, although we think steps should

    be taken to reduce the length of both the mathematics and ELA tests. Major contributors to the extra

    testing time in the English language arts assessment are the policies on the release of open-end or

    constructed-response items.

    States and the state assessment consortia across the country are adding significant testing times to

    their programs, due to more comprehensive standards and the increased use of performance

    assessments to better gauge student achievement. Even reduced ISTEP+ tests may be longer than

    those used in the past.

    The lack of previously pilot-tested or field-test items requires the use of more items than normal this

    year, to make sure that there is a set of items for producing this years test information.

    We believe that the item-overage levels being used are not excessive (in the 50% range), given the

    structure of the tests and the nature of the intended score reports.

    We believe that there are ways the Department can reduce testing time and accomplish its

    assessment purposes, as explained below.

    RECOMMENDATIONS FOR IMMEDIATE IMPLEMENTATION

    Based on the principles cited above and recognizing the findings listed, we made the following

    recommendations for actions to reduce testing this year. These were presented to the State Board of

    Education during their emergency meeting on February 13, 2015.

    Recommendation 1: The Department should not release the open-ended (OE) items used in the 2015

    ISTEP+ and the 2016 ISTEP+ programs. Instead, we recommend the releases of example items that are highly

    similar to the OE items. We recommend the OE item release policy be restored once the state has a sufficient

    pool of items for use in the assessment in the future. It is our hope that this will be for the 2016 ISTEP+

  • 5 | P a g e

    program, but that decision should await analysis of the results of the 2015 administration to determine if the

    item pool is large enough to build assessments for 2016, 2017 and beyond.

    Rationale: Our preliminary review indicates that the greatest contributor to the increased testing time is the

    need to operationally administer and release enough Part 1 OE items in 2015 as well as pilot items for

    comparable 2016 tests. We recognize that educators in Indiana rely on released items to guide instruction.

    We would therefore include the recommendation that high-quality example items be produced this spring

    and released publicly when the results are reported. This policy should be used for 2015 and for the 2016

    assessment, unless the item pool is robust enough to permit the release of OE items in 2016. This needs to be

    determined after the 2015 OE items have been analyzed, making sure that there are enough items for use in

    2016 as well as 2017.

    Work to be done: IDOE and the State Board of Education (SBOE) should determine what changes would be

    required to implement this recommendation. How many items are needed to build this years final form and

    the comparable forms next year, if items are reused that worked this year?

    Recommendation 2: IDOE should administer some parts of the 2015 ISTEP+ to only a sample of students

    being tested this year.

    Rationale: Another significant contributor to the test length is the requirement that all students take all

    items (which is the major down-side of an operational field test), including items that will be used in future

    testing. This is because none of the items in this years assessment have been used previously so the state

    needs a pool of items from which to construct the final test for 2015 and to build the tests for 2016 and

    beyond.

    We recommend that items in test sessions be identified as core or sample items. All students would take

    the core items and half would take each set of sample items. This is a standard testing method called matrix

    sampling that has been used in Indiana in the past. Field testing of items usually can be done with many

    fewer student responses.

    The easiest place to implement this recommendation is the Part 1 OE assessments, where currently there are

    two parallel forms in the testing for all students. By giving each student only one of these sets, an estimated 3

    hours and 5 minutes testing (two days/four test sessions) can be eliminated, reducing testing time for both

    the mathematics and ELA assessments.

    One way to do this most easily would be for IDOE to designate two comparable groups of school corporations

    or schools and then direct each group to take the appropriate sessions. The Department could use a list of all

    schools across the state to designate the odd schools on the list to take some sessions, while the even

  • 6 | P a g e

    schools on the list to take the other sessions. This procedure will require detailed communications to the

    school corporations as soon as possible.

    Work to be done: IDOE should designate the Part 1 OE assessment sessions for sampling, with half of the

    schools administered half of the Part 1 sessions. The Indiana legislature and appropriate agencies should

    change state policies/regulations on the release of all of OE items for two years, unless the OE item pool is

    large enough to release the 2016 OE items. IDOE should communicate to schools any needed changes in test

    administration procedures as soon as possible.

    Recommendation 3: The Social Studies portion of the test should be suspended for one year.

    Rationale: Since these tests are not required by NCLB, nor apparently used in school accountability, this

    change will reduce testing time by 75 minutes (one day and two sessions) for student in grades 5 and 7. An

    option that the state might want to explore is to permit schools to use the social studies test on a voluntary

    basis, rather than suspending this assessment in its entirety. This would permit those who are very interested

    in the assessment to still use them. This is feasible because these items are contained in printed tests already

    in the schools in Indiana.

    Work to be done: IDOE and SBOE should determine what changes in policy or regulation are required to

    implement this recommendation as soon as possible. If required, they should also seek legislative authority to

    implement this recommendation.

    Recommendation 4: IDOE should identify now which 2015 ISTEP+ mathematics and ELA test items best

    align to the Indiana standards.

    Rationale: The Department and its contractor CTB-McGraw-Hill should now identify the core mathematics

    and English language arts assessment within the 2015 ISTEP+ tests. We recommend that the core set of items

    be identified from those used in the 2015 program so that if they work, the state can be assured that the set of

    items work together and are aligned to Indianas standards, thus meeting one of the key Federal peer review

    criteria. If any of these items do not work, then IDOE can replace these items from the overage that did work

    well.

    Work to be done: IDOE should be identifying the intended assessment now, in order both to guide the

    analysis of the field-tested items later this spring, as well as to inform the likely test length for 2016 and

    beyond. IDOE staff and its contractor should select the set of operational field test items for use as the actual

    assessment, assuming that the items work. This can be used to assure that this assessment is aligned to the

    Indiana standards.

  • 7 | P a g e

    Recommendation 5: IDOE should identify the assessment design and anticipated testing time for the 2016

    ISTEP+ program and release the information publicly this spring to demonstrate that the testing time issues

    this year are a one-time event.

    Rationale: Because we hope that the testing time issue for 2015 assessment is a one-time phenomenon, we

    suggest IDOE verify this by producing an assessment design for the 2016 assessment program. In this design,

    the use of core and matrix sampling by Part and Test Session should be illustrated, along with the number of

    items of each type and the testing time by Part and Session. We think that IDOE should announce the

    parameters for the 2016 program soon so as to assure local educators that 2015 is a one-time only event.

    Work to be done: IDOE should describe the 2016 assessments so as to show the number of Parts, Sessions,

    assessment items and testing times. IDOE should also indicate how matrix sampling will be implemented in

    both the paper/pencil tests and the online assessments for 2016 and beyond. This information should be

    released by this spring. This will serve to illustrate to educators, parents and other members of the public that

    the testing time issue is a one-time event, limited to the 205 assessment.

    Recommendation 6: We recommend that vertical scaling items be removed from the 2015 online

    assessments.

    Rationale: There are other options for calculating growth in 2015 and the vertical scale could be constructed

    in 2016. However, each student takes only 5 items for vertical scaling and the testing time would be reduced

    by only a few minutes, which is not a significant reduction.

    Work to be done: IDOE should determine what changes would be required to implement this

    recommendation.

  • 8 | P a g e

    RECOMMENDATIONS FOR LONG-TERM ISTEP+ QUALITY AND EFFICIENCY

    In addition to our recommendations for immediate implementation, we also make longer-term

    recommendations regarding the operation of the ISTEP+ program by the Indiana Department of Education

    with support and oversight from the Indiana State Board of Education. The focus of these recommendations is

    to improve the ISTEP+ program in 2016 and beyond. These recommendations are based on best practices in

    measurement and large-scale assessment. For each recommendation, we provide a rationale and a summary

    of the additional work needed to implement these recommendations.

    We identified recommendations in five areas:

    Determining Test Length for 2016 and Beyond

    External Assistance (TAC, Operational Support, Statewide Advice and Feedback)

    Test Blueprint as a Planning and Communication Document

    Transition Planning

    Improving Agency Communication

    DETERMINING TEST LENGTH FOR 2016 AND BEYOND

    Recommendation 7: Based on the results of 2015 tests, IDOE should investigate the feasibility of shortening

    the ISTEP+ tests in 2016 and beyond.

    Rationale: Test length is a complicated issue that will inevitably be resolved as a compromise between

    competing demands. Test reliability is directly related to test length: the longer a test, the more reliable it will

    be. However, student learning is directly related to instructional time on task: the more instruction a student

    receives, the more learning occurs. Testing and instruction come out of the same time in school. Therefore,

    test designers must balance these and other interests.

    ISTEP+ is newly revised for 2015. The performance of students on its new item types is not known at this

    time. When those items have been calibrated and we know how long students spent responding to them this

    year, it will be possible to predict the reliability of shorter versions of the test. If a shorter test can produce

    sufficiently reliable results, testing time can be reduced. Many states administer shorter tests to students in

    grades 3 and 4 than to students in upper grades. This approach may also work for Indiana.

    When creating this assessment design, IDOE should consider strategies to trim assessment times even

    further, looking especially at the ELA assessment, since this assessment is currently the longest assessment

    component. The number of reading passages, items and writing prompts should be carefully determined and

    a strong rationale for those numbers created. Since a separate reading score is not reported in Indiana, it may

    be the case that the number of reading passages and items used to report by ELA standard can be reduced.

  • 9 | P a g e

    Attention should be paid to the mathematics test, since it is long as well. The goal should be to produce a solid

    assessment of ELA and mathematics with the fewest items possible and with a number of embedded field test

    items, each administered to small samples of students.

    Another possibility is to develop computer adaptive tests (CAT). These are tests in which a computer is

    programed to customize the test by selecting items for each student based on their answers to previous

    questions. By selecting items that are optimally informative of the students ability, the test can be shorter

    than a test designed for all students in the state. However, CAT assessments require a larger item pool and

    thus may not work for Indiana in the short term.

    Work to be done: IDOE must decide who can do the analysis. The current contractor, the new contractor

    and an independent consultant are all possibilities. Whoever does the analyses, the results must be ready

    quickly so that the Department has time to evaluate the advantages and disadvantages of a shortened test.

    The review and decision-making process that begins after the psychometric analysis is complete should not

    be short-changed. Test length is inevitably a compromise of conflicting interests, so we cannot expect the

    technical analysis to answer the question fully. We suggest that a decision-making group be identified and

    their work scheduled in advance such that they have time to deliberate and make decisions before 2016

    forms are built.

    EXTERNAL ASSISTANCE

    Recommendation 8: We recommend that Indiana establish a technical advisory committee that includes

    individuals who have specific expertise to provide technical advice to the SBOE and IDOE. We also

    recommend that IDOE establish a standing Indiana assessment advisory committee.

    Rationale: The field of large-scale assessment is advancing rapidly in response to increased demands on

    assessment systems to support school and educator accountability as well as instruction of more rigorous

    and comprehensive academic standards. Most states find it unrealistic to fund staff positions to obtain all the

    requisite expertise. Most states obtain technical expertise through a Technical Advisory Committee (TAC).

    Generally such committees are composed of members with a variety of backgrounds and experiences who are

    chosen to support the specific assessments the state is administering or developing. An advantage of TACs

    lies in their independence. They do not have the financial interest of a testing contractor and they should have

    no political allegiance to any state agency. Therefore, their advice for resolving a testing issue or opinion on a

    proposal would be more valued by both supporters and critics of the state's assessment system.

    Another option is for the agency responsible for developing and administering the assessment to contract for

    psychometric services on an as-needed basis. This makes sense during intense periods of development or

    transition when day-to-day interactions with agency staff and the test contractor are required. The

  • 10 | P a g e

    consultant(s) can represent the state's interests in ensuring the assessment is designed or administered as

    correctly and efficiently as possible.

    IDOE should also establish and maintain an assessment advisory committee comprised of representatives of

    various educational and other organizations with a strong interest in ISTEP+. This includes representatives of

    teacher, parent, administrator, policymakers, business leaders and others. The committee should serve to

    facilitate two-way communication between the IDOE and the groups they represent. IDOE should use the

    group not only as a means of communicating with the parties with an interest in assessment, but also as a

    sounding board for receiving feedback on new ideas and new designs for assessment. This group should

    review any major proposed changes and provide its input to both IDOE and the State Board of Education as

    the state considers proposed changes. This should be an official part of the charge to the group from IDOE,

    with support of the SBOE.

    Work to be done: We suggest that establishing a TAC be a priority. The first step in doing that is to

    determine where responsibility for hiring and convening the TAC should lie. In Indiana, this would most likely

    be the SBOE or IDOE. (A less common option is used in Kentucky, where the legislature convenes the TAC.) In

    most states, the agency responsible for administering the assessment convenes that TAC. Since the SBOE has

    an oversight role with the Department, it might also work for them to convene the TAC.

    Once that decision is made, the process of identifying desired areas of expertise and finding qualified TAC

    members can begin. Since there are key decisions to be made regarding the design of the 2016 assessment

    and the possible transition to a new contractor or contractors, we suggest an aggressive schedule that will

    allow the first meeting of the new TAC to occur by late spring or early summer.

    The TAC should be comprised of individuals with psychometric, statistical, practical assessment backgrounds

    and include one or more individuals with a background of working with students with disabilities and English

    language learners. There are a number of persons who have focused on assisting state assessment programs

    successfully carry out the technical work that underlies programs such as ISTEP+. These are individuals who

    are or have been employed at the university level or who are or have been working in assessment-related

    organizations.

    The size of most state assessment TACs is 5 or 6 individuals. They typically meet 3 or 4 times per year, usually

    for 1 to 2 days. Such a group might meet more often during times of new assessment design work and less

    often when the program work is being conducted successfully.

    TACs usually review assessment plans, the procedures used by states to implement these plans and reviews

    the results of the work of each contractor. The goal is to provide an independent technical overview of the

    work of the contractor(s) and the Department.

  • 11 | P a g e

    The assessment advisory committee should be comprised of individuals nominated by various education and

    education-related organizations and individuals with a strong interest in assessment in Indiana These may

    include teachers, administrators (building and school corporation levels), school boards, subject-matter

    organizations (mathematics, ELA, science and social studies), universities, parents, students and business

    groups.

    TEST BLUEPRINT AS A PLANNING AND COMMUNICATION DOCUMENT

    Recommendation 9: We recommend that IDOE develop test specifications and blueprint documents for the

    2015 and 2016 versions of ISTEP+ as soon as possible.

    Rationale: A test blueprint is an essential part of test design. It can also be an effective tool for

    communicating the intent, qualities and interpretations of a test to educators and to more general audiences.

    Conversely, the lack of test blueprint can lead contractors to rely on oral communication or scattered

    documents to guide the design. Also, stakeholders who are not directly involved in the development can

    become confused or suspicious about the test that is produced.

    We've attached a comprehensive test specification and test blueprint document published by Oregon as an

    example of what such a document could look like (Appendix A). Some information in that document is specific

    to Oregon's adaptive online assessment, so would not apply to Indiana. However, other sections would be

    helpful in avoiding the issues that arose around this year's testing. Note that the Introduction and Background

    sections provide an overview of the assessment system. The Score Reporting Category section provides the

    link between the content standards and the scores produced by the test. The largest section is the Content

    Standards Map. Here, the specific content standards and strands that are assessed in each reporting category

    are described in detail. Note the Boundaries of Assessable Content and Sample Items. This information

    provides clear guidance to item writers and also communicates to teachers precisely how the content will be

    assessed. The Test Blueprint section includes the weighting chart that is Indiana's current blueprint and there

    is additional information about item specifications, content coverage as well as the Achievement Level

    Descriptors, which, in Indiana, will describe how performance on the test relates to college and career

    readiness.

    Oregon's example is not the only model for a test blueprint, nor is it reasonable to create a document as

    comprehensive as this one right away. It is provided as an example of how the test blueprint can serve as a

    communication tool for a variety of audiences.

    Work to be done: As a first step, the design of the 2015 operational test should be documented in as much

    detail as possible. Using information provided by CTB, we have put together a table showing the number and

    types of items by grade that will be used to report results this spring (Appendix B). The points generated by

  • 12 | P a g e

    each item type and the total points are also included. This is important because the number of items alone

    does not describe a test. Items vary in the number of points they generate for scoring. Simple multiple-choice

    items that are scored as correct or incorrect generate one point. Complex items like the writing prompt

    generate 10 points. It is the points that determine the weight of items in a test score. Therefore, the percent of

    points in each reporting category is used to verify that the test covers the content as intended. (These

    percentages do match the existing blueprint weighting charts.)

    We suggest that IDOE develop a similar chart for the 2016 test as soon as possible. This chart will be

    important to the new contractor to use to construct next year's tests. It can also be helpful to communicate to

    educators what to expect next year, particularly what will be different next year from this year's experience.

    Testing time is not included in the charts, but such information should be added to the tables or listed in

    accompanying tables.

    When developing this chart, we also recommend that IDOE and its contractor carefully examine the number

    of reading passages and items, as well as OE writing prompts used. This could serve to shorten the ELA

    assessment and yet yield reliable score information on the ELA content standards. The Department should

    also revisit the reporting categories and their weighting. Currently, there is a separate reporting category for

    Reading Vocabulary that is weighted at 3% - 13%. If the goals of the assessment can be met by including the

    vocabulary content in the other reporting categories, a highly reliable and significantly shorter test could be

    produced.

    We suggest that IDOE direct the new contractor to develop a more comprehensive test specifications and

    blueprint document. The goal would be to have a preliminary version ready for distribution by late fall this

    year so that it would supplement other Department communications about the 2016 ISTEP+.

    TRANSITION PLANNING

    Recommendation 10: We recommend that Indiana (the SBOE and IDOE) seek external assistance to guide

    the transition of the ISTEP+ and other assessment components, should the state select new vendors for any of

    its assessment components.

    Rationale: The transition from one vendor for an assessment program component to another one is a

    significant event in the operation of the assessment program. There are a myriad of details that the

    current/outgoing contractor has been successfully handling, all of which need to be transferred to the

    new/in-coming vendor. In addition, there are scoring routines, statistical analyses and various analysis and

    reporting programs that the incoming vendor needs to replicate in order to provide seamless reporting of

    current and prior assessment results at the student, classroom, school and school corporation levels. In

  • 13 | P a g e

    addition, it is not uncommon for the outgoing vendor to slowly lose interest in the successful transition, since

    they are not implementing the assessment program in the future.

    Many state agency staff involved in these transition activities are not experienced in how to successfully

    transition their assessment program from the outgoing to the incoming contractors. Making matters worse,

    these staff is occupied more than full time with making sure that the current year assessment activities

    proceed flawlessly and do not have the time to fully attend to the transition activities. The result is that

    necessary transition activity may not occur or may not occur as needed or not be carried out error-free. The

    net result is that the initial year of work of the new contractor may not be carried out on a timely basis, such

    that tests are not available when needed, testing may be delayed, analyses are not carried out accurately and

    test results may contain errors or not be produced when needed.

    The solution that states with small assessment staff that may not have the time or experience in transitioning

    assessment programs is for the agency to hire individuals or organizations experienced in successful

    assessment program transition. These persons or organization can be tasked with securing the needed

    information from the outgoing contractor, providing these resources to the incoming contractor, assuring

    that the new contractor successfully incorporates these into its operational assessment systems. The

    transition specialists can also serve as shuttle diplomats between the two vendors to assure that

    information is provided as needed by the new contractor(s).

    Work to be done: Once IN has determined which contractor(s) it will use going forward (and resolved any

    disputes or challenges to these awards), the state should seriously consider who it will use as the transition

    specialist(s). While this person(s) or organizations will cost the state, the avoidance of the typical transition

    issues can be priceless.

    IMPROVING AGENCY COMMUNICATION

    Recommendation 11: We recommend that the SBOE and IDOE review inter-agency communication, both at

    the state level and with local school corporations and that both agencies commit to improvement being made

    to ensure the best possible assessment system for students, educators, parents and citizens of Indiana.

    Rationale: While we did not investigate the causes of the widespread concerns about test length, it is likely

    that better communication would have reduced the problems. Effective communication is difficult in all

    complex organizations. Effective leaders and project managers constantly strive to improve communication.

    Therefore any actions to avoid future problems with the states assessment system should include efforts to

    better coordinate agency planning, decision-making and implementation of those decisions.

  • 14 | P a g e

    Work to be done: It is always difficult to find time to review the communication effectiveness of public

    agencies. However despite that fact, the SBOE and IDOE should establish a process of review and implement

    any improvements identified during that process. One way in which this could be accomplished is to share

    accountability for advisory groups. For example, the IDOE could convene the TAC and both the IDOE and the

    SBOE could identify issues or questions to be discussed. The in-state assessment advisory committee could be

    managed by IDOE and the committee could be required to send members to report to the SBOE on a

    scheduled or as-needed basis.

    One way to enhance communication with the field, including school corporation boards, administrators and

    other educators, is for the IDOE to communicate frequently with those affected by the assessment program.

    The Michigan assessment staff, for example, publishes a twice-monthly electronic newsletter and sends it to

    anyone who signs up for its listservs (assessment coordinators, principals, curriculum specialists, etc.).

    Program changes, new procedures and updates on each assessment component are handled in this manner.

    This is in addition to ongoing superintendents letters, which serve as the official communication method

    between MDE and its local school districts.

    Teachers can be a challenging group to communicate with directly, if the state does not maintain the names

    and physical or email addresses of teachers in the state. Thus, states need to rely on administrators to provide

    information to its teachers. One way to facilitate communication in this instance is for the state to provide

    information (e.g., a teacher newsletter) to administrators and for them to provide it to the teachers in their

    districts.

    Similarly, parents are an important and challenging audience to communicate with directly. As with the

    suggestion for teachers above, IDOE might develop communication materials (e.g., flyers or newsletters) that

    administrators and teachers can use to communicate with parents. IDOEs and the SBOEs communication

    offices could consider developing a communication plan on assessment and then prepare the press releases

    and the online communication pieces to be used as the communication plan is implemented.

    It may be most helpful if the IDOE and SBOE staff work to create a comprehensive and coordinated

    assessment communication plan. This plan could include a careful consideration of target audiences,

    information needed by each audience, the mechanisms to be used to communicate with each group, the

    resources (print, video, online, etc.) needed for each group and when the information needs to be provided.

    Such a plan should address both what each member of the target audience (e.g., a building-level

    administrator) needs to know and understand about assessment, but also the resources that he or she needs

    to communicate to others (e.g., teachers, parents and others in the school community). The latter resources

    are important because they help the state use these individuals as secondary communicators, and make it

    easier for them to pass along information without being an expert on assessment.

  • 15 | P a g e

    Note that the new ISTEP+ is being driven by the new Indiana Academic Standards. IDOE should be

    coordinating explanations of the new assessments with ongoing support for teachers who are implementing

    the Academic Standards.

  • 16 | P a g e

    ADDITIONAL OBSERVATIONS

    In the course of our review of the ISTEP+ program testing time issue, other ideas, thoughts and concerns

    occurred to us. We raise these with the intent that they allow the state to identify and avoid other issues that

    may affect the program in the future.

    1. The re-use of previously developed and used, as well as previously developed but not used items,

    apparently was not examined.

    When an assessment program transitions to measuring a new or revised set of standards, it is customary to

    consider whether some or all of the items that already existpreviously used or notmight be used to

    measure at least parts of the new tests. Sometimes these existing items can continue to be used, while in

    other instances, the change in standards is such that the items are not aligned to the new standards and are of

    no use for the new assessment. It is advantageous to use such items because not only has the state already

    paid for their development, the items are of proven quality and should require less extensive field testing for

    use in future versions of the ISTEP+ instruments. We were puzzled by the apparent decision not to at least

    examine their suitability to measure the new Indiana standards.

    Assessment staff could convene content specialists (university content specialists, curriculum specialists from

    school corporations and classroom teachers) under secure conditions to review the item pools for possible

    alignment to the new Indiana standards, either at the grade that the items were originally written to, or at a

    higher or lower grade. Reviewers could identify items that match a standard strongly, as well as recommend

    how the connection of an item to a standard could be strengthened. This may have permitted existing items

    that Indiana already owns to be used in the new assessments, potentially reducing the amount of operational

    field testing necessary.

    We were not given evidence that such a review was done, but we believe that it should have been carried out.

    And if it hasnt, it may still be a useful exercise as new tests forms are created annually.

    2. Pilot testing of items is one way to make sure that operational tests field test items that are

    likely to work. However, some local educators don't like the disruption of state testing twice during

    the school year.

    We heard the issue of pilot testing raised several times. Pilot testing the trial of items with a small number

    of students - is useful to assure that the items to be field tested are likely to work. This can reduce either the

    total number of items to be field tested, the number of field test items given to any one student, or both. It is

    better to determine that an item doesn't work in a small scale pilot than in a large statewide field test.

  • 17 | P a g e

    Informal pilot testing might occur in a couple of ways. When new items are first created and especially when

    new item formats are created, it can be helpful to administer the items to 20-30 students to see if they work.

    It is possible to learn a lot from this sort of informal use of the items. After items have been written and

    edited, they might next be tried out by approximately 100 students who are generally chosen to include high-

    and low-achieving students because of the perspectives that they bring to testing.

    However, the use of students to pilot test in the fall is of concern to the school corporations. One way to lessen

    the impact is to package items into smaller units that would be about the length of an ISTEP+ session. This

    would limit the testing time per classroom or student to 30-40 minutes. The downside of this approach is

    more schools will need to participate, but at least the participation of each school would be minimized.

    3. The types of measures used in the ELA assessments can affect testing time.

    We observed that the ELA assessment contains a number of open-ended items, as well as some clusters of

    items that may affect testing time. In general, we support the use of authentically long reading texts with a

    cluster of assessment items (for efficiency sake), as well as the more involved processes of writing

    assessment (e.g., read a passage, answer multiple-choice items, then answer open-ended items). These are

    more complex item types, but they also mirror the types of language tasks that adults are asked to do. Thus,

    we feel that they should remain, even though they contribute to a longer overall test. The results of the 2015

    assessment should be carefully reviewed to determine the time it actually takes for students to complete

    these new items and to evaluate if the information they provide is worth the time they require.

    The greater the number of reading passages with attendant items, the longer the assessment. An essential

    part of the assessment blueprint to be developed and disseminated to the field is a description of the number

    of passages and items required to adequately cover the states content standards. The rationale for these

    numbers should be provided as well. This is particularly critical since Indiana does not intend to report a

    reading score separate from ELA scores. Thus, the ELA assessment could be assembled using fewer reading

    passages and yet yield reliable and valid ELA scores. This possibility should be investigated via the blueprint

    that we are recommending. We have not seen such a blueprint or a rationale statement for the number of

    passages and items anticipated in the final operational test. If it does not exist, it should be created soon.

    Finally, these types of ELA measures can affect the overall level of performance. It is reasonable to speculate

    that a more involved ELA assessment using open-end items would result in somewhat lower performance at

    first. The actual difference will not be known until after standard setting, but teacher, parents and other

    interested parties should be prepared for that possibility. Federal assessment regulations do not specify

    details such as type of assessment prompts, their number or their length. Currently, it is the match between

  • 18 | P a g e

    the wording of the standards and the format and content of the items measuring the standards that is

    important.

    4. Have adequate numbers of items been operationally field tested in 2015 to construct the

    operational 2015 and the operational 2016 tests?

    The reason for this concern on our part is that the test to be used in 2016 will be constructed from items used

    on the 2015 operational test that can be re-used, combined with 2015 operationally field-tested items that

    appeared to work. However, this is an untested assumption, since the state wont know the survival rate of

    the operationally field tested items until students responses are analyzed. Thus, there is a risk that some

    operational field testing may still need to occur in 2016. This risk is most acute with the Part I OE items, since

    these are some of the most likely not to work. It is not improbable, given that only two versions of the Part 1

    OE assessments are being field-tested, that one or more item type may not work in either version. That adds

    risk to not only the 2015 test administration, but to 2016 testing as well. This is why we recommended that

    the OE items not be released for two years. While our preference is to resume the release of the Part 1 OE

    items in 2016, we strongly believe that this should occur only if there are sufficient OE items to build the

    2017 assessments, including items from 2015 as well as items field tested in 2016. We also recommended

    that IDOE consider a shorter version of the test. It would be prudent to begin now to consider what would be

    the minimum number of raw score points that would be required given the purposes of the test.

    5. Meeting Federal testing requirements including alignment, peer review, etc.

    The Federal government does not dictate testing time minimums or maximums, how many test items are to

    be used, or which item types are to be used or not used. Federal assessment requirements do not restrict

    state decisions about test design at this level of detail.

    USED does require that states tests be technically sound and that the states provide evidence that the tests

    support their intended uses (are valid) and that they are sound measures (produce reliable information about

    students achievement). States must demonstrate this through information submitted for peer review.

    Currently, it is the match between the number of standards, the content and processes included in the

    standards, the wording of the standards and the design of the test that is important.

    This is why we included a longer-term recommendation for sound assessment blueprints for the 2015 and

    the 2016 ISTEP+ assessments. We believe that such blueprints are the starting point for the demonstration of

    the validity of the assessment instruments and the documents we have found are not sufficient in scope and

  • 19 | P a g e

    detail to be of use in defining the operational ISTEP+ assessments or in seeking USED approval of the ISTEP+

    program.

    If a core set of items has been identified, this core test could be used to determine its alignment to the Indiana

    standards. The Webb alignment tool is often used to help the state determine this key aspect of peer review

    requirements.

    Ultimately, the U.S. Department of Education (USED) will agree or disagree as to the reliability and validity of

    the assessments, as well as to the adequacy of the information provided. If USED believes that there are

    deficiencies, it will permit the state time to correct those issues.

    6. Setting college- and career-ready standards

    This is an issue that a number of states have faced, since assuring readiness for college and careers is a

    common goal nationally. One challenge in judging readiness for college and careers is the wide definition of

    what constitutes readiness. For example, is college readiness defined as success in any post-secondary

    institution, success in a community college, or just success in a highly selective four-year university? Is it just

    grade-point average (GPA), or graduation in a reasonable number of years? Is career success the ability to

    work well in entry-level jobs (many of which require relatively low skill levels) or the ability to take part in

    and succeed in entry-level job training programs with advancement opportunities?

    There are a variety of methods of organizing standard setting and obtaining useful cut scores. One way is to

    use experts to judge the relationship between ISTEP+ tests and the skills needed to succeed in college and on

    the job. The cut points should be set by panels of citizens who are familiar with K-12 education, as well as the

    requirements for success in college and in a career. The other strategy is to seek external data sets and set

    standards on the ISTEP+ tests at levels comparable to the external data. It is likely that the professional

    judgment methods will be more useful for younger grade levels. External data such as NAEP or ACT results

    are more relevant at the secondary level. The state should rely on the new testing vendor to propose a

    suitable method and on outside technical advice to review and approve the method.

    7. Transitioning the assessment programs to new vendor(s)

    The transition of contract work from one vendor to another is a significant activity for an assessment

    program. Transitions involve considerable risks, including late delivery of testing materials, errors in testing

    materials, errors in scoring and reporting and late delivery of and inaccurate assessment results. For the

    incoming contractor(s) to succeed, it will be necessary for the outgoing contractor to assist with the

    transition. This is especially true in the case of ISTEP+ because the incumbent contractor has held the

  • 20 | P a g e

    contract for so many years. The state must rely on the vendor to document scoring and reporting decisions,

    for example, if IDOE does not have that documentation in hand already. There are a myriad of decisions that

    the new vendor needs to succeed in implementing the ISTEP+ assessments.

    The first step would be for this assessment staff to request the outgoing contractor to carefully and

    thoroughly document all decisions in its IT systems, test development procedures, statistical analyses and

    reporting metrics. The challenge here is that the out-going contractor, having lost the contact, may devote

    fewer resources to helping the state to successfully transition the work to new vendors.

    A key activity in states that transition from one vendor to another is to assure that open-ended items are

    scored in a comparable manner. The outgoing contractor should provide the complete scoring guides

    (containing the scoring rubric(s) and examples of student work used in training, certification and validity

    checks) to the incoming contractor. The incoming contractor should use these materials to train and qualify

    raters for scoring the 2016 items. Before the new contractor scores 2016 student responses, it may be

    advantageous to have the new vendor re-score a sample of 2015 student papers to verify that the new vendor

    can score students responses in a comparable manner.

    Some states hire a consultant or an organization to assist in this effort to serve as a facilitator of the

    conversations and work of the incoming and outgoing contractor. This can assist the assessment staff to

    understand the information needed for the success of the new vendor and make requests to acquire the

    information from the outgoing contractor. This transition consultant can also serve as a shuttle diplomat

    between the outgoing and incoming vendors.

    If the state chooses to award the work currently being done by one contractor to multiple different incoming

    vendors, the use of a facilitator is even more important, since each new vendor will need assistance in

    transition activities and the work of the different incoming contractors will need to be coordinated to provide

    a coherent experience for local educators in Indiana. Both of us have first-hand experience with these types of

    transition work, so we know the level of work required to make the transition run smoothly.

    Many states also contract with independent psychometric consultants or organizations to replicate the

    statistical analyses of the test data, item calibrations, reporting and linking of forms within and across years.

    While errors in these calculations are rare, they are very disruptive and undermine public confidence in the

    system.

    8. Level of staffing

    We are concerned about the level of staffing of the assessment unit of IDOE. The number of assessment

    components and their complexity drive this concern. We believe that the sponsoring agency (i.e., IDOE)

    should be in charge of making the key decisions about the assessment program and the procedures used. This

    requires the staff to do more than simply turn over the responsibility to the contractor. And, as suggested

  • 21 | P a g e

    above, if the assessment contractors employed by the state change that will mean significant transition

    activities will be added to the IDOEs work.

    To assure the successful transition discussed earlier, we believe that two or more additional assessment staff

    FTE are needed to handle the transition. The number of staff members reflects the number of vendors that

    will be used in the future and whether the current vendors are among them. New vendors require

    substantially more staff work than would be the case if the state continued to use existing contractors. For

    example, the Michigan Department of Education (MDE) has about 10 individuals who work in the assessment

    administration unit alone. In Michigan, one person is assigned full-time to each of the states five assessment

    programs, plus there is a program manager and support staff. Thus, two new staff members for the

    assessment unit in Indiana is a conservative estimate of staffing levels necessary.

    These assessment administration specialists would focus on documenting the assessment procedures of the

    current contractor, transferring this information to the new contractor(s) and working with the new

    contractor(s) to assure that they are using past procedures with their work going forward. We believe in the

    motto: trust, but verify. In order to do this, the assessment program needs to be staffed adequately and they

    must take responsibility for making key assessment decisions.

    SUMMARY

    We commend the state for tackling this thorny issue and working together to resolve it. We believe that if

    these recommendations are followed, testing time can be reduced to more manageable levels. We also believe

    that implementing our long-term recommendations will improve the design and implementation of the

    ISTEP+ program in the future. We remain willing to assist in and perhaps monitor efforts to implement these

    recommendations.

  • APPENDIX A

    The sample Test Blueprint from Oregon is a large document. It was sent as a separate file. The link to

    Oregons webpage is:

    http://www.ode.state.or.us/wma/teachlearn/testing/dev/testspecs/asmtmatestspecsg5_2011-12.pdf

  • APPENDIX B

  • 2015 Operational ELA Item Number and Type with Points Awarded by Reporting Category

    Grade/Reporting Category Part 1 Part 2 Total

    Points % CR (2) ER (8) WP (10) MC/TE (1) TE (2) Items Points

    3 3 (6) 1 (8) 1 (10) 40 (40) 5 (10) 50 74

    Reading: Literature 19 (19) 1 (2) 20 21 28%

    Reading: Nonfiction and Media Literacy 3 (6) 12 (12) 1 (2) 16 20 27%

    Reading: Vocabulary 5 (5) 1 (2) 6 8 11%

    Writing: Genres, Writing Process, Research Process (4) 1

    (4)

    (6) 1

    (4)

    2 (2) 1 (2) 5 14 19%

    Writing: Conventions of Standard English 2 (2) 1 (2) 5* 11 15%

    4 3 (6) 1 (8) 1 (10) 40 (40) 5 (10) 50 74

    Reading: Literature 3 (6) 12 (12) 1 (2) 16 20 27%

    Reading: Nonfiction and Media Literacy 19 (19) 1 (2) 20 21 28%

    Reading: Vocabulary 5 (5) 1 (2) 6 8 11%

    Writing: Genres, Writing Process, Research Process (4) 1

    (4)

    (6) 1

    (4)

    2 (2) 1 (2) 5 14 19%

    Writing: Conventions of Standard English 2 (2) 1 (2) 5* 11 15%

    5 3 (6) 1 (8) 1 (10) 40 (40) 5 (10) 50 74

    Reading: Literature 19 (19) 1 (2) 20 21 28%

    Reading: Nonfiction and Media Literacy 3 (6) 12 (12) 1 (2) 16 20 27%

    Reading: Vocabulary 5 (5) 1 (2) 6 8 11%

    Writing: Genres, Writing Process, Research Process (4) 1

    (4)

    (6) 1

    (4)

    2 (2) 1 (2) 5 14 19%

    Writing: Conventions of Standard English 2 (2) 1 (2) 5* 11 15%

    6 3 (6) 1 (8) 1 (10) 40 (40) 5 (10) 50 74

    Reading: Literature 3 (6) 12 (12) 1 (2) 16 20 27%

    Reading: Nonfiction and Media Literacy 19 (19) 1 (2) 20 21 28%

    Reading: Vocabulary 5 (5) 1 (2) 6 8 11%

    Writing: Genres, Writing Process, Research Process (4) 1

    (4)

    (6) 1

    (4)

    2 (2) 1 (2) 5 14 19%

    Writing: Conventions of Standard English 2 (2) 1 (2) 5* 11 15%

    7 3 (6) 1 (8) 1 (10) 40 (40) 5 (10) 50 74

    Reading: Literature 18 (18) 1 (2) 19 20 27%

    Reading: Nonfiction and Media Literacy 3 (6) 13 (13) 1 (2) 17 21 28%

    Reading: Vocabulary 5 (5) 1 (2) 6 8 11%

    Writing: Genres, Writing Process, Research Process (4) 1

    (4)

    (6) 1

    (4)

    2 (2) 1 (2) 5 14 19%

    Writing: Conventions of Standard English 2 (2) 1 (2) 5* 11 15%

    8 3 (6) 1 (8) 1 (10) 40 (40) 5 (10) 50 74

    Reading: Literature 3 (6) 12 (12) 1 (2) 16 20 27%

    Reading: Nonfiction and Media Literacy 19 (19) 1 (2) 20 21 28%

    Reading: Vocabulary 5 (5) 1 (2) 6 8 11%

    Writing: Genres, Writing Process, Research Process (4) 1

    (4)

    (6) 1

    (4)

    2 (2) 1 (2) 5 14 19%

    Writing: Conventions of Standard English 2 (2) 1 (2) 5* 11 15%

    The number of TE Items and points per TE item may vary depending on field test results. * Includes double scored items

  • 2015 Operational Math Item Number and Type with Points Awarded by Reporting Category

    Grade/Reporting Category Part 1 Part 2 Total Points Points

    % CR (4) ER (6) MC/TE (1) TE (2) Items Points

    3 4 (16) 1 (6) 37 (37) 9 (18) 51 77

    Number Sense 9 (9) 1 (2) 10 11 14%

    Computation 9 (9) 1 (2) 10 11 14%

    Algebraic Thinking/Data Analysis 3 (6) 7 (7) 4 (8) 11 21 27%

    Geometry/Measurement 1 (2) 1 (3) 12 (12) 3 (6) 15 23 30%

    Mathematical Process (8) (3) 5* 11 14%

    4 4 (16) 1 (6) 37 (37) 9 (18) 51 77

    Number Sense 1 (2) 9 (9) 1 (2) 11 13 17%

    Computation 1 (2) 10 (10) 2 (4) 13 16 21%

    Algebraic Thinking/Data Analysis 2 (4) 8 (8) 3 (6) 13 18 23%

    Geometry/Measurement 1 (3) 10 (10) 3 (6) 14 19 25%

    Mathematical Process (8) (3) 5* 11 14%

    5 4 (16) 1 (6) 46 (46) 9 (18) 51 77

    Number Sense 6 (6) 1 (2) 6 8 10%

    Computation 1 (2) 14 (14) 3 (6) 15 22 29%

    Algebraic Thinking/Data Analysis 2 (4) 14 (14) 3 (6) 16 24 31%

    Geometry/Measurement 1 (2) 1 (3) 12 (12) 2 (4) 14 21 27%

    Mathematical Process (8) (3) 5* 11 14%

    6 4 (16) 1 (6) 46 (46) 9 (18) 51 77

    Number Sense/Computation 1 (2) 21 (21) 4 (8) 22 27 35%

    Algebra/Functions 1 (2) 1 (3) 12 (12) 3 (6) 14 20 26%

    Geometry/Measurement 2 (4) 7 (7) 1 (2) 9 12 16%

    Data Analysis/Statistics 6 (6) 1 (2) 6 7 9%

    Mathematical Process (8) (3) 5* 11 14%

    7 4 (16) 1 (6) 46 (46) 9 (18) 51 77

    Number Sense/Computation 1 (2) 16 (16) 4 (8) 17 22 29%

    Algebra/Functions 1 (2) 1 (3) 13 (13) 3 (6) 15 21 27%

    Geometry/Measurement 2 (4) 9 (9) 1 (2) 11 14 18%

    Data Analysis/Statistics/Probability 8 (8) 1 (2) 8 9 12%

    Mathematical Process (8) (3) 5* 11 14%

    8 4 (16) 1 (6) 46 (46) 9 (18) 51 77

    Number Sense/Computation 1 (2) 9 (9) 2 (4) 10 13 17%

    Algebra/Functions 1 (2) 1 (3) 14 (14) 2 (4) 16 21 27%

    Geometry/Measurement 1 (2) 14 (14) 3 (6) 15 19 25%

    Data Analysis/Statistics/Probability 1 (2) 9 (9) 2 (4) 10 13 17%

    Mathematical Process (8) (3) 5* 11 14%

    The number of TE Items and points per TE item may vary depending on field test results. * Includes double scored items

    Recommendations for Immediate ImplementationRecommendations for Long-Term ISTEP+ Quality and EfficiencySummaryRecommendations for Immediate ImplementationReview PrinciplesFindingsRecommendations for Immediate Implementation

    Recommendations for Long-Term ISTEP+ Quality and EfficiencyDetermining Test Length for 2016 and BeyondExternal AssistanceTest Blueprint as a Planning and Communication DocumentTransition PlanningImproving Agency Communication

    Additional ObservationsSummary