24
This module covers an introduc2on to contemporary educa2onal tes2ng and measurement. 1

Module 1 Introduction - coedu.usf.edu · 22. Some%of%the%areas%of%interestare%highMstakes%tes2ng,%performance%and%porbolio% ... Title: Module 1_Introduction.pptx Created Date: 1/8/2010

Embed Size (px)

Citation preview

This  module  covers  an  introduc2on  to  contemporary  educa2onal  tes2ng  and  measurement.    

1  

In  this  module,  we  will  discuss  how  tests  are  only  tools  and  why  test  scores  are  fallible.  We  will  also  dis2nguish  between  tes2ng  and  assessment,  and  further  explain  why  tes2ng  and  assessment  skills  are  vital  to  today’s  classroom  teacher.  In  exploring  the  recent  history  of  educa2onal  measurement,  we  will  be  able  to  beAer  iden2fy  the  implica2ons  of  current  trends  in  educa2onal  measurement  for  today’s  classroom  teacher.  

2  

There  are  three  schools  of  thought  in  terms  of  the  usage  of  tes2ng  in  educa2on.      

The  first  one  is  that  tes2ng  provides  liAle  benefit  to  measuring  the  learning  and  may  have  a  detrimental  effect  on  how  students  feel  about  educa2on.      

The  second  is  that  tests  are  impera2ve  in  the  educa2on  system.      

Lastly  there  are  people  who  believe  that  tests  are  important  tools  used  to  evaluate  students,  curricula,  and  instruc2on  but  ques2on  how  much  educa2onal  power  is  placed  on  tests  and  test  scores.      

The  authors  of  your  textbook  belong  to  the  third  group.  

3  

Tests  are  only  tools  and  their  usefulness  can  vary.  The  usefulness  of  the  test  depends  on  five  factors:  1)  the  use  of  the  test,  2)  the  test  design,  3)  the  users  of  the  test,  4)  the  purpose  or  the  popula2on  that  the  test  is  designed  for,  and  5)  limited  informa2on  provided  for  the  decision.  

As  a  tool,  the  test  can  be  appropriately  used,  uninten2onally  misused,  or  inten2onally  abused.  

Like  other  tools,  the  test  can  be  well-­‐designed  or  poorly  designed.  Although  the  test  is  well-­‐designed,  it  can  be  dangerous  in  the  hands  of  ill-­‐trained  or  inexperienced  users.  The  test  can  be  limited  if  the  test  is  used  for  the  purpose  or  the  popula2on  that  the  test  is  not  designed  for.  The  test  may  meet  these  four  criteria  but  the  test  results  can  only  provide  some  of  the  informa2on  for  the  best  educa2onal  decision  about  a  student.    

The  five  concerns  men2oned  above  help  us  recognize  that  the  usefulness  of  tests  depends  on  a  variety  of  factors.  Let’s  explore  these  factors  further.    

4  

A  crucial  factor  that  affects  a  test’s  usefulness  is  its  technical  adequacy.  The  technical  adequacy  of  a  test  includes  evidence  of  test  validity  and  test  score  reliability.  

Validity  evidence  helps  us  to  determine  whether  the  test  is  measuring  what  it  is  purported  to  measure.    

Score  reliability  indicates  the  extent  to  which  test  scores  are  consistent  and  stable.    

Validity  and  reliability  are  not  fixed  characteris2cs  of  a  test  because  they  can  be  changed  by  many  factors,  such  as  the  competency  of  test  user,  whether  the  test  is  being  used  as  intended,  and  whether  the  test  takers  matches  the  popula2on  for  which  the  test  was  wriAen.  

5  

The  evidence  of  a  test’s  usefulness  can  vary  depending  on  the  competency  of  the  people  administering,  scoring,  and  interpre2ng  the  test.  

Competent  test  users  can  make  beAer  use  of  a  test.  

6  

The  usefulness  of  a  test  depends  on  whether  or  not  it  is  being  used  for  its  intended  purpose.  

Tests  have  been  designed  for  many  specific  educa2onal  purposes,  such  as  intellectual  func2oning,  personality  func2oning,  voca2onal  ap2tudes,  and  so  on.    

To    use  the  test  for  a  different  purpose  may  limit  the  usefulness  of  the  tests  for  that  unintended  purpose.  For  instance,  a  test  designed  to  iden2fy  the  ability  of  recognizing  typos  in  the  manuscript  is  used  to  predict  the  ability  of  wri2ng  a  book.  In  this  case,  the  test  is  more  limited.  

7  

In  addi2on  to  specific  purposes,  educa2onal  tests  can  also  be  designed  for  the  more  general  purposes  including  summa2ve  and  forma2ve  assessment.  

The  tests  for  summa2ve  assessment  are  intended  to  measure  students’  learning  aUer  the  comple2on  of  a  unit  of  instruc2on.  Summa2ve  tests  can  be  used  to  assign  grades,  evaluate  curriculum  effec2veness,  and  annual  gains  at  student,  school,  and  district  levels.  Summa2ve  tests  are  designed  to  measure  larger  and  broader  changes  in  achievement  rather  than  small,  daily  gains.  

In  contrast,  the  tests  for  forma2ve  assessment  will  be  more  useful  to  inform  daily  gains  and  effec2veness  of  instruc2on.  Forma2ve  tests  tend  to  be  short  to  minimize  interference  with  instruc2onal  2me  and  to  facilitate  repeated  administra2on  in  the  classroom.  Curriculum-­‐based  measurement  that  is  one  of  forma2ve  assessment  can  be  used  as  part  of  the  instruc2on  process  to  monitor  students  progress.    

8  

To  enhance  test  usefulness,  we  have  to  consider  if  the  test  matches  diverse  test  takers.  

Since  our  school  systems  contain  students  represen2ng  a  large  array  of  cultures,  language,  and  educa2onal  backgrounds,  we  cannot  expect  the  technical  adequacy  of  these  tests  to  be  the  same  when  used  with  popula2ons  from  diverse  backgrounds,  such  as  middle  Eastern  learners,  limited  English  learners,  and  lower  socioeconomic  learners.    

As  educators,  we  need  to  strive  to  choose  tests  that  are  the  most  useful  assessments  for  our  student  popula2on.  However,  educa2onal  tests  may  not  be  as  useful  for  students  with  differing  backgrounds  as  those  of  the  intended  student  popula2ons.    If  we  cannot  find  a  good  match  between  the  test  and  the  students  taking  the  test,  we  need  to  be  especially  aware  and  careful  when  interpre2ng  the  results  of  the  test.  

9  

Test  usefulness  is  also  related  to  how  test  results  are  used  and  considered.  

What  should  we  do  about  test  results?    

The  ideal  situa2on  is  that  important  decisions  should  never  be  made  as  the  results  of  a  single  test  administra2on.  This  sugges2on  of  using  test  results  can  be  applied  even  when  technical  adequacy,  competency  of  test  user,  and  intended  purpose  have  all  been  met  for  the  assessment.    

In  reality,  however,  single  test  administra2ons  are  oUen  used  to  make  very  important  educa2onal  decisions,  such  as  promo2on  and  grada2on.  To  bridge  the  gap  between  this  theory  and  the  reality,  efforts  (such  as  transla2on)  have  been  made  to  adapt  the  assessments  to  align  more  closely  with  the  needs  of  diverse  popula2ons.  However,  the  technical  adequacy  and  fairness  of  these  adapta2ons  has  been  hard  to  determine  and  need  to  be  studied  more  in  depth.  

10  

Instead  of  using  a  single  test,  it  is  best  to  collect  more  assessment  informa2on  about  student  achievement  to  make  important  educa2onal  decisions.    

A  single  test  is  just  like  a  limited  snapshot  or  photograph  of  student  performance.  The  test  results  can  be  considered  to  be  part  of  the  whole  assessment  process.  The  whole  assessment  process  is  like  a  video.  To  make  appropriate  educa2onal  decisions,  watching  the  whole  video  is  always  recommended.            

11  

In  the  beginning  of  the  measurement  course,  it  is  important  to  clarify  some  technical  test-­‐related  terminology.    

The  terms  “tests”  and  “assessments”  can  be  regarded  as  synonyms.    

The  public  likes  the  term  “  assessment”  rather  than  “tes2ng”  because  the  use  of  the  term  “assessment”  is  less  evalua2ve,  threatening,  or  nega2ve.  Furthermore,  a  clear  dis2nc2on  can  be  made  between  tests  (or  assessments)  and  the  assessment  process.    

A  test  (or  assessment)  can  be  thought  of  as  a  single  measure  at  a  single  point  in  2me  while  the  assessment  process  spans  a  period  of  2me  and  uses  mul2ple  measures  to  gain  a  broader  view  of  student  achievement  or  characteris2cs.      

A  test  is  either  forma2ve  or  summa2ve  while  the  assessment  process  can  contain  both  forma2ve  and  summa2ve  measures  administered  at  different  points  in  2me.  

12  

Let’s  look  at  some  of  the  different  types  of  assessments.    

Understanding  the  differences  between  these  types  of  assessments  will  help  as  you  progress  through  the  remainder  of  the  textbook.      

The  differences  will  be  related  to  the  types  of  answers/responses  the  test-­‐takers  will  produce,  type  of  informa2on  assessed,  and  also  how  the  responses  are  scored.  Here,  the  inten2on  is  only  to  highlight  the  major  differences  among  four  types  of  assessments:  1)  objec2ve  tests  vs.  essay  or  performance  and  porbolio  assessment;  2)  teacher-­‐made  tests  vs.  standardized  tests;  3)  norm-­‐referenced  tests  vs.  criterion-­‐referenced  tests;  and  4)  curriculum-­‐based  measurements.  

13  

Tests  with  the  most  consistent  and  objec2ve  scoring  are  “objec2ve  items.”    

Next  in  terms  of  scoring  consistency  and  objec2vity  would  be  “comple2on  items.”  Essays,  performances,  and  porbolios  may  be  difficult  to  score  consistently  and  objec2vely.    However,  they  are  more  commonly  used  than  objec2ve  items  when  assessing  higher-­‐order  skills.    All  four  types  of  these  items  can  prove  useful  at  different  2mes  in  the  assessment  process.    

14  

Teacher-­‐made  tests  have  a  lot  of  flexibility  or  variability  in  terms  of  construc2on,  administra2on,  and  scoring  while  standardized  tests  are  designed  to  eliminate  that  flexibility.    Standardized  tests  are  strictly  regulated  in  terms  of  administra2on  and  scoring  and  they  are  wriAen  by  tes2ng  professionals.      

While  both  types  of  assessments  commonly  contain  objec2ve  items,  teacher-­‐made  tests  are  much  more  likely  to  contain  essay  items  than  standardized  tests.  

15  

Norm-­‐referenced  tests  compare  individual  student  performance  to  a  norm  group  which  can  be  world-­‐wide,  na2onal,  state-­‐wide,  or  district-­‐wide.      

Criterion-­‐referenced  tests  compare  individual  student  performance  to  an  absolute  standard  or  criterions.    

Norm-­‐referenced  tests  have  a  broad  focus  and  can  be  rather  lengthy  while  criterion-­‐referenced  tests  typically  have  a  narrower  focus  and  are  shorter  in  length.      

16  

Curriculum-­‐based  measurements  are  rela2vely  new  in  regular  educa2on  classrooms  but  they  have  been  used  in  special  educa2on  classrooms  for  quite  some  2me.      

They  are  commonly  used  to  assess  daily  gains  in  math,  reading,  wri2ng,  and  spelling.    

Curriculum-­‐based  measurements  can  be  teacher-­‐wriAen  or  wriAen  by  a  commercial  tes2ng  company.    If  curriculum-­‐based  measurements  are  used  as  a  norm-­‐referenced  assessment  it  is  important  to  note  that  the  ‘norm’  group  used  for  comparison  is  not  as  carefully  selected  as  that  of  typical  norm-­‐referenced  tests  and  is  more  likely  a  convenience  sample.  They  are  rela2vely  short  but  have  evidence  of  validity  and  reliability.  

17  

Various  regular  and  special  educa2on  reform  efforts  have  had  a  major  impact  on  classroom  tes2ng  and  assessment.  

18  

No  Child  LeU  Behind  and  the  Elementary  and  Secondary  Educa2on  Act  have  raised  expecta2ons  of  the  regular  educa2on  system  in  the  United  States.    The  raised  expecta2ons  have  included  the  crea2on  of  expecta2ons  and  performance  standards,  local  decision  making,  new  teacher  trainings,  performance  pay,  higher  teacher  salaries,  increased  accountability,  and  high-­‐stakes  tes2ng.  

The  passage  of  the  Educa2on  of  All  Handicapped  Children  Act  in  1975  served  as  a  way  to  ensure  all  students,  including  handicapped  students,  received  a  free  and  appropriate  public  educa2on.  The  passage  of  this  act  served  to  have  special  educa2on  be  a  separate  en2ty  rather  than  regular  educa2on  with  separate  rules,  staff,  procedures,  and  requirements.    

However,  this  created  a  nega2ve  impact  on  special  educa2on  as  most  students  were  educated  in  segregated  special  educa2on  seings  instead  of  mainstreamed  into  regular  educa2on  seings  with  the  excep2on  of  non-­‐academic  ac2vi2es.    

Due  to  these  nega2ve  consequences,  the  Individuals  with  Disabili2es  Act  and  Individuals  with  Disabili2es  with  Educa2on  Improvements  Act  were  passed  to  ensure  that  special  educa2on  students  were  educated  in  regular  educa2on  seings  as  much  as  possible,  by  regular  educa2on  staff,  and  held  to  the  same  standards  as  their  regular  educa2on  peers  when  possible.  

The  Individuals  with  Disabili2es  with  Educa2on  Improvements  Act  and  No  Child  LeU  Behind  are  meant  to  be  complementary  pieces  of  legisla2on  rather  than  being  viewed  as  special  educa2on  and  regular  educa2on  reform.    They  both  emphasize  the  use  of  scien2fically  based  instruc2on  and  emphasize  reading  achievement  for  all  students  as  well  as  ongoing  progress  monitoring  and  the  use  of  forma2ve  assessment.  

19  

The  passage  of  Individuals  with  Disabili2es  with  Educa2on  Improvements  Act  changed  how  students  with  learning  disabili2es  are  iden2fied,  thus  increasing  the  impact  that  regular  educa2on  teachers  have  on  the  iden2fica2on  process.      

Since  this  act  focuses  on  forma2ve  assessments,  new  forma2ve  assessment  methods  have  evolved  including  the  response-­‐to-­‐interven2on  model,  universal  screening,  and  progress  monitoring  using  CBM.  

20  

Concerns  about  the  ability  of  US  students  to  compete  in  the  global  economy  has  increased  our  reliance  on  test  results  and  accountability  to  ensure  our  educa2on  systems  are  effec2ve.  

According  to  the  2007  Trends  in  Interna2onal  Mathema2cs  and  Science  Study  data,  performance  of  US  fourth  and  eighth  grade  students  improved  in  math  but  remained  the  same  in  science.      

US  students  s2ll  fall  behind  Asian  countries  in  math  and  science  and  behind  some  European  countries  in  science.    

The  results  of  the  2006  Program  for  Interna2onal  Student  Assessment  were  not  as  promising.  The  data  suggested  that  15  year-­‐old  students  in  the  USA  performed  below  the  average  for  industrialized  na2ons  in  math  and  science,  scoring  lower  than  average  than  15  year-­‐old  students  in  Asian  and  European  countries.  

21  

The  reliance  on  a  paper-­‐and-­‐pencil  test  to  determine  if  a  teacher  possesses  the  complex  skills  required  to  be  a  good  classroom  teacher  has  raised  some  concerns.      As  with  student  assessment,  reliance  on  a  single  test  may  pose  a  problem.    With  the  emphasis  on  the  student  assessment  process,  involving  porbolios  may  lead  to  the  use  of  porbolios  as  a  method  of  assessing  the  skills  of  classroom  teachers.    

22  

Some  of  the  areas  of  interest  are  high-­‐stakes  tes2ng,  performance  and  porbolio  assessments,  tes2ng  students  with  disabili2es,  language,  and  culturally-­‐appropriate  assessments,  and  computer-­‐based  tes2ng.      

Professional  organiza2ons  have,  historically,  influenced  policy  and  reform  and  therefore,  may  help  shape  the  future  of  educa2onal  assessment.    

23  

Classroom  teachers  are  becoming  more  and  more  responsible  for  the  crea2on  of  valid  and  reliable  classroom  tests  as  well  as  the  administra2on  of  standardized  tests.      

Along  with  those  responsibili2es  comes  the  need  to  understand  how  to  interpret  test  data  and  report  that  data  to  administrators  and  parents.  With  the  increased  emphasis  on  accountability,  NCLB,  and  Individuals  with  Disabili2es  Educa2on  Improvement  Act,  teachers  should  have  at  least  a  basic  knowledge  of  various  types  of  tests  and  how  to  interpret  results.  

24