5
Update: Performance on FCAT 2.0 Since the release of this year’s thirdgrade FCAT 2.0 Reading and Mathematics scores yesterday, there has been much confusion around what the scores mean with respect to achievement in previous years. Considering that over the past two years Florida has (at different times) changed both the test itself (FCAT to FCAT 2.0) and the scale and passing benchmarks the test is measured on, it is understandable for confusion and frustration to arise when the stakes involved are so high. What Happened Much of the confusion following from yesterday’s release was due to the differences between proficiency levels reported last year and proficiency levels retroactively applied to last years’ data using the new scale – which the DOE is now using to compare year to year performance in this year’s release. As a quick recap, up until 2011 students were taking the FCAT – which had its own measurement scale and established proficiency levels and passing scores appropriate to that test. In 2011, the test itself changed to the new FCAT 2.0, a more rigorous exam aligned with the new Next Generation Sunshine State Standards (implemented in anticipation of getting students ready to meet the expectations of the Common Core Standards coming in 2014). Still with me? However when the new FCAT 2.0 was administered last year, it did not yet have its own new measurement scale established. So in order to maintain yeartoyear reporting consistency, the measurement scale from the old test (FCAT) was applied to the results of the new test (FCAT 2.0). As a result, reported performance on the new test last year looked very similar to performance on the old test from previous years. Earlier this year, the state completed the process of establishing a new measurement scale uniquely tailored to the demands of the FCAT 2.0 to be used moving forward. That is what is being used now, and what they were now able to go back and retroactively apply to last year’s data for comparison sake. By reapplying the new scale and standards to last year’s data, the new results looked as if there had been little to no change in performance levels. While this is true and more accurate, it is still no less confusing to those who remember being told that 72% of students were proficient statewide last year and are now being told only 57% actually were. In Table 1 below, we see this year’s performance levels compared to what was reported last year (in yellow) and last year’s actual performance reported this year (in blue).

Changes to the FCAT 2.0 — 2011 and 2012 Comparisons

Embed Size (px)

Citation preview

Page 1: Changes to the FCAT 2.0 — 2011 and 2012 Comparisons

 

Update:  Performance  on  FCAT  2.0    Since  the  release  of  this  year’s  third-­‐grade  FCAT  2.0  Reading  and  Mathematics  scores  yesterday,  there  has  been  much  confusion  around  what  the  scores  mean  with  respect  to  achievement  in  previous  years.    Considering  that  over  the  past  two  years  Florida  has  (at  different  times)  changed  both  the  test  itself  (FCAT  to  FCAT  2.0)  and  the  scale  and  passing  benchmarks  the  test  is  measured  on,  it  is  understandable  for  confusion  and  frustration  to  arise  when  the  stakes  involved  are  so  high.          What  Happened    Much  of  the  confusion  following  from  yesterday’s  release  was  due  to  the  differences  between  proficiency  levels  reported  last  year  and  proficiency  levels  retroactively  applied  to  last  years’  data  using  the  new  scale  –  which  the  DOE  is  now  using  to  compare  year  to  year  performance  in  this  year’s  release.        As  a  quick  recap,  up  until  2011  students  were  taking  the  FCAT  –  which  had  its  own  measurement  scale  and  established  proficiency  levels  and  passing  scores  appropriate  to  that  test.    In  2011,  the  test  itself  changed  to  the  new  FCAT  2.0,  a  more  rigorous  exam  aligned  with  the  new  Next  Generation  Sunshine  State  Standards  (implemented  in  anticipation  of  getting  students  ready  to  meet  the  expectations  of  the  Common  Core  Standards  coming  in  2014).    Still  with  me?        However  when  the  new  FCAT  2.0  was  administered  last  year,  it  did  not  yet  have  its  own  new  measurement  scale  established.    So  in  order  to  maintain  year-­‐to-­‐year  reporting  consistency,  the  measurement  scale  from  the  old  test  (FCAT)  was  applied  to  the  results  of  the  new  test  (FCAT  2.0).    As  a  result,  reported  performance  on  the  new  test  last  year  looked  very  similar  to  performance  on  the  old  test  from  previous  years.    Earlier  this  year,  the  state  completed  the  process  of  establishing  a  new  measurement  scale  uniquely  tailored  to  the  demands  of  the  FCAT  2.0  to  be  used  moving  forward.    That  is  what  is  being  used  now,  and  what  they  were  now  able  to  go  back  and  retroactively  apply  to  last  year’s  data  for  comparison  sake.  By  reapplying  the  new  scale  and  standards  to  last  year’s  data,  the  new  results  looked  as  if  there  had  been  little  to  no  change  in  performance  levels.        While  this  is  true  and  more  accurate,  it  is  still  no  less  confusing  to  those  who  remember  being  told  that  72%  of  students  were  proficient  statewide  last  year  and  are  now  being  told  only  57%  actually  were.      In  Table  1  below,  we  see  this  year’s  performance  levels  compared  to  what  was  reported  last  year  (in  yellow)  and  last  year’s  actual  performance  reported  this  year  (in  blue).          

Page 2: Changes to the FCAT 2.0 — 2011 and 2012 Comparisons

  2012  %  Level  3  and  Above  

2011  Level  3  and  Above  

(PREVIOUSLY  REPORTED)  

Change  between  2011  (PREVIOUSLY  REPORTED)  and  2012  

Performance  

2011  Level  3  and  Above  

(RETROACTIVELY  RESCALED)  

Change  between  2011  

(RETROACTIVELY  RESCALED)  and  

2012  Performance  

Reading            Duval  County   51%   69%   -­‐  18%   55%   -­‐  4%  

State   56%   72%   -­‐  16%   57%   -­‐  1%  Math            Duval  County   56%   79%   -­‐  23%   56%   0%  

State   58%   78%   -­‐  20%   56%   +  2%      To  be  clear  –  the  confusion  occurring  this  year  is  largely  the  result  of  what  the  DOE  did  and  reported  last  year  –  applying  the  prior  FCAT  scale  to  the  new  FCAT  2.0  test  results  to  try  to  maintain  some  kind  of  year-­‐to-­‐year  consistency.      Last  year’s  results  were  artificially  scaled  to  match  student  performance  levels  from  the  previous  year,  rather  than  measuring  students  against  an  independent  scale,  and  thus  not  a  true  reflection  of  performance  on  the  FCAT  2.0  at  the  time.    While  there  is  much  anger  about  the  scale  being  changed  this  year,  that  was  the  correct  and  necessary  thing  to  do  –  it  just  should  have  been  done  last  year.    How  the  scale  changed    Aside  from  being  more  appropriately  aligned  to  the  content  of  the  new  test,  the  new  measurement  scale  is  actually  an  improvement  over  the  old  scale  in  a  few  key  ways.  In  Figure  1  below,  we  see  the  achievement  level  cutoff  scores  for  grades  3-­‐10  on  the  old  scale  for  the  previous  FCAT.    

Figure  1:  Previous  FCAT  Developmental  Scale  Score  Cut  Scores

   

86  

586  

1086  

1586  

2086  

2586  

3   4   5   6   7   8   9   10  

Page 3: Changes to the FCAT 2.0 — 2011 and 2012 Comparisons

The  lowest  line  represents  the  cut  scores  between  Level  1  and  Level  2  achievement  across  grade  levels,  above  that  is  the  cut  score  line  between  Level  2  and  Level  3,  and  so  on  up.  The  diamond  line  represents  Level  3  and  above  cutoff.  We  see  that  at  face  value  the  cutoffs  appear  to  be  not  very  consistent  across  years  and  to  become  more  closer  together  over  time,  making  it  more  difficult  to  score  in  Levels  2,  3  or  4  at  higher  grade  levels.    Compare  that  now  with  the  achievement  level  cutoff  scores  on  the  new  FCAT  2.0  measurement  scale  in  Figure  2  below.    We  see  much  more  consistency  in  expectations  for  each  level  between  the  lower  and  upper  grade  levels  in  the  new  scale.  

 Figure  2:  New  FCAT  2.0  Scale  Score  Cut  Scores  

   

 This  simple  comparison  is  useful  for  becoming  basically  oriented  about  how  the  previous  (FCAT)  and  new  (FCAT  2.0)  achievement  level  standards  operate  relative  to  one  another,  but  they  are  insufficient  for  making  any  final  conclusions  about  how  the  two  sets  of  standards  compare  for  a  number  of  reasons  –  including  that  they  reflect  growth  on  two  different  scales.        To  account  for  these  scale  differences,  another  way  to  compare  these  two  sets  of  standards  is  to  look  at  the  percentage  of  total  possible  points  at  each  grade  level  that  students  must  earn  to  meet  each  cutoff  score  under  the  two  systems.    For  example,  the  previous  FCAT  developmental  scale  ranged  from  86  to  3008  points.    A  3rd  grader  could  score  between  86  and  2514  on  that  scale  (a  2,428  point  range)  and  needed  to  score  at  least  an  1198  to  reach  the  Level  3  cutoff.    Given  the  possible  range  of  points  a  3rd  grader  could  score  in,  a  score  of  1198  would  mean  earning  46%  of  the  total  possible  points  a  3rd  grader  could  earn  (or,  in  effect,  

150  

170  

190  

210  

230  

250  

270  

290  

3   4   5   6   7   8   9   10  

Page 4: Changes to the FCAT 2.0 — 2011 and 2012 Comparisons

demonstrating  mastery  of  46%  of  the  tested  skills  for  3rd  grade  Reading).  In  comparison,  the  new  FCAT  2.0  scale  ranges  from  140  to  302  points.    A  3rd  grader  can  score  between  140  and  260  on  that  scale  (a  120  point  range)  and  needs  to  score  at  least  a  198  to  reach  the  Level  3  cutoff.    Given  the  possible  range  of  points  a  3rd  grader  could  score  in,  a  score  of  198  would  mean  demonstrating  mastery  of  48%  of  the  tested  skills  for  3rd  grade  Reading.    Figure  3:  Previous  FCAT  Developmental  Scale  Cut  Scores  as  a  %  of  Total  Possible  Points  

   

As  we  see  in  Figure  3,  the  percent  of  total  possible  points  on  the  test  that  students  needed  to  score  for  each  level  fluctuated  significantly  from  year  to  year  under  the  previous  FCAT  standards.  For  example,  in  order  to  be  considered  a  Level  3  reader  under  the  previous  system  a  student  would  need  to  demonstrate  mastery  of  46%  of  the  material  in  3rd  grade,  then  50%  of  the  material  in  4th  grade,  back  to  46%  of  the  material  in  5th  grade,  and  then  continually  increasing  amounts  each  year  up  to  almost  60%  of  the  material  in  10th  grade.    Figure  4:  Proposed  New  FCAT  2.0  Scale  Cut  Scores  as  a  %  of  Total  Possible  Points  

 

20%  

30%  

40%  

50%  

60%  

70%  

80%  

3   4   5   6   7   8   9   10  

20%  

30%  

40%  

50%  

60%  

70%  

80%  

3   4   5   6   7   8   9   10  

Page 5: Changes to the FCAT 2.0 — 2011 and 2012 Comparisons

 

 If  we  compare  that  to  the  new  FCAT  2.0  scales  in  Figure  4,  we  see  that  the  new  scale  is  much  more  consistent  in  terms  of  what  is  expected  of  students  across  years:    

• A  Level  1  reader  is  a  student  who  demonstrates  mastery  of  ~  less  than  35%  of  grade-­‐level  material  –  every  year.      

• A  Level  2  reader  is  one  who  demonstrates  mastery  of  between  ~  35%  to  50%  of  grade-­‐level  material  –  every  year.      

• A  Level  3  reader  is  one  who  demonstrates  mastery  of  between  ~  50%  to  60%  of  grade-­‐level  material  –  every  year.      

• A  Level  4  reader  is  one  who  demonstrates  mastery  of  between  ~  60%  to  73%  of  grade-­‐level  material  –  every  year.      

• A  Level  5  reader  is  one  who  demonstrates  mastery  of  ~  more  than  73%  of  grade-­‐level  material  –  every  year.  

 This  implication  is  important  for  a  number  of  reasons.  First,  it  establishes  a  system  that  holds  students,  teachers  and  faculty  to  the  same  standards  of  mastery  across  all  grade  levels.    Second,  and  perhaps  just  as  importantly,  it  makes  those  standards  meaningful  by  creating  a  common  language  across  all  grade  levels  about  what  it  means  to  be  performing  at  each  achievement  level  so  that  teachers  and  administrators  at  each  successive  grade  level  will  have  a  concrete  understanding  of  what  an  incoming  student’s  previous  performance  indicates  about  their  needs.        This  common  language  is  also  important  from  a  research  and  reporting  perspective  for  being  able  to  make  more  direct  comparisons  across  grade  levels,  such  as  when  looking  at  things  like  3rd  grade  reading  performance  vs.  10th  grade  reading  performance  across  the  district.