31
I wish I could believe you: the frustrating unreliability of some assessment research Tim Hunt & Sally Jordan, The Open University @tim_hunt @SallyJordan9 Are these two things related ?

I wish I could believe you: the frustrating unreliability of some assessment research

Embed Size (px)

Citation preview

Page 1: I wish I could believe you: the frustrating unreliability of some assessment research

I wish I could believe you:the frustrating unreliability of some assessment research

Tim Hunt & Sally Jordan, The Open University@tim_hunt @SallyJordan9

Are these two things related?

Page 2: I wish I could believe you: the frustrating unreliability of some assessment research

Trick question (of course)

2

From a great web sitehttp://www.tylervigen.com/spurious-correlations

Page 3: I wish I could believe you: the frustrating unreliability of some assessment research

Correlation &Causation

3

Page 4: I wish I could believe you: the frustrating unreliability of some assessment research

Sly (1999)

614 students P01 S01 S02

Practice tests as formative assessment improve student performance on computer-managed learning assessment

4

A computerised assessment was quite exciting in itself back in 1999!

Questions picked at random from a bank.

P01 & S01 used the same test bank.S02 was different, with no practice.

Page 5: I wish I could believe you: the frustrating unreliability of some assessment research

Sly (1999)

614 students P01 S01 S02 609 students

417 62.18% 72.72% 66.88% 415

197 – 67.56% 62.24% 194

Practice tests as formative assessment improve student performance on computer-managed learning assessment

5

All standard deviations 15–17%

Page 6: I wish I could believe you: the frustrating unreliability of some assessment research

Sly (1999)

614 students P01 S01 S02 609 students

417 62.18% 72.72% 66.88% 415

197 – 67.56% 62.24% 194

Practice tests as formative assessment improve student performance on computer-managed learning assessment

6

All standard deviations 15–17%

+5.38%+5.16% +4.64%

Page 7: I wish I could believe you: the frustrating unreliability of some assessment research

OU level 3 physics (SM358)

An investigation into factors affecting physics’ students engagement with online assessment (Bolton & Jordan)

7

Page 8: I wish I could believe you: the frustrating unreliability of some assessment research

OU level 3 physics (SM358)

The assessment strategy

8

0 TMAs 1 TMA 2 TMAs 3 TMAs 4 TMAs

0 iCMAs

1 iCMA

2 iCMAs

3 iCMAs

4 iCMAs

5 iCMAs

6 iCMAs

Page 9: I wish I could believe you: the frustrating unreliability of some assessment research

OU level 3 physics (SM358)

Proportion of students

9

0 TMAs 1 TMA 2 TMAs 3 TMAs 4 TMAs

0 iCMAs 11.6% 3.4% 1.5% 0.5% 0.5%

1 iCMA 1.5% 1.0%

2 iCMAs 1.5% 2.4% 1.5%

3 iCMAs 1.5%

4 iCMAs 5.3% 2.4%

5 iCMAs 0.5% 3.9% 5.8% 8.2%

6 iCMAs 0.5% 0.5% 5.8% 5.8% 34.3%

Page 10: I wish I could believe you: the frustrating unreliability of some assessment research

OU level 3 physics (SM358)

Exam mark

10

0 TMAs 1 TMA 2 TMAs 3 TMAs 4 TMAs

0 iCMAs 6.0

1 iCMA

2 iCMAs 17.0 24.0

3 iCMAs 60.0

4 iCMAs 43.7 62.0

5 iCMAs 23.0 46.0 62.6 69.5

6 iCMAs 35.3 60.8 77.5

Page 11: I wish I could believe you: the frustrating unreliability of some assessment research

OU level 3 physics (SM358)

Exam mark compared to predictive model

11

0 TMAs 1 TMA 2 TMAs 3 TMAs 4 TMAs

0 iCMAs −20.8

1 iCMA

2 iCMAs −43.9 −27.5

3 iCMAs −9.0

4 iCMAs −15.6 +1.8

5 iCMAs −3.8 −11.1 +1.4 +2.4

6 iCMAs −17.1 +3.4 +4.6

Page 12: I wish I could believe you: the frustrating unreliability of some assessment research

Confoundingvariables

12

Page 13: I wish I could believe you: the frustrating unreliability of some assessment research

Berkeley gender bias case (1973)

Men Women

Applicants Admitted Applicants Admitted

Total 8442 44% 4321 35%

https://en.wikipedia.org/wiki/Simpson%27s_paradox#Berkeley_gender_bias_case

13

Page 14: I wish I could believe you: the frustrating unreliability of some assessment research

Berkeley gender bias case (1973)

Men Women

Department Applicants Admitted Applicants Admitted

A 825 62% 108 82%

B 560 63% 25 68%

C 325 37% 593 34%

D 417 33% 375 35%

E 191 28% 393 24%

F 272 6% 341 7%

https://en.wikipedia.org/wiki/Simpson%27s_paradox#Berkeley_gender_bias_case

14

Page 15: I wish I could believe you: the frustrating unreliability of some assessment research

RealExperiments

15

Page 16: I wish I could believe you: the frustrating unreliability of some assessment research

What is an experiment?

Split participants into two equal groups.

Split randomly, so if there are confounding variables,they are probably equally split between groups.

Give different ‘treatments’ to each group,trying to keep everything else the same.

Blind the treatment, if possible, to reduce all sorts of biases.

But, blinding is not normally possible in education.(You probably know if you just sat an exam!)

[Pick your favourite research methods book]

16

Page 17: I wish I could believe you: the frustrating unreliability of some assessment research

Karpicke & Blunt (2011) + many more

Retrieval practice produces more learning than elaborative studying with concept mapping

17

Page 18: I wish I could believe you: the frustrating unreliability of some assessment research

… but! Wooldridge et al (2014)

The testing effect with authentic educational materials:A cautionary note

18

“Based on [the testing effect], … some textbooks are now accompanied by quizzing ancillaries …The quizzes are designed with the assumption that answering factual and application questions will promote a more integrated mental model that incorporates the target knowledge.”

Typically, the quizzes and test banks sample items from similar sub-sections in the textbook but not necessarily the same information.

Page 19: I wish I could believe you: the frustrating unreliability of some assessment research

… but! Wooldridge et al (2014)

19

The testing effect with authentic educational materials:A cautionary note

Page 20: I wish I could believe you: the frustrating unreliability of some assessment research

How reliable isstudent opinion?

20

Page 21: I wish I could believe you: the frustrating unreliability of some assessment research

Background

Our own work with interactive computer-marked assignments (iCMAs)

21

Page 22: I wish I could believe you: the frustrating unreliability of some assessment research

Findings from a questionnaire

StatementDefinitely agree or

mostly agreeNeutral

Mostly or definitely disagree

Answering iCMA questions helps me to learn

129(85%)

7(5%)

12(8%)

If I get the answer to an iCMA question wrong, the computer-generated feedback is useful

128(85%)

11(7%)

8(5%)

Responses received from 151 students (response rate 20%)(Jordan, 2011)

22

Page 23: I wish I could believe you: the frustrating unreliability of some assessment research

Watching students in a usability lab

Six students observed answering questions (Jordan, 2009)

23

Page 24: I wish I could believe you: the frustrating unreliability of some assessment research

Data analysis

Much more data presented in Jordan (2014)

24

Page 25: I wish I could believe you: the frustrating unreliability of some assessment research

Reflection

Weaver (2006, p. 386) reports that 90% of students agreed with the statement “Positive comments have boosted my confidence.”

Marriott (2009, p. 243) reports that 93% of students agreed with the statement“I find the immediate reporting of my test result valuable.” It is almost certainly the case that more students report that they find feedback useful than actually make good use of it. This is in line with the bias in self-reported behaviour that is observed in medicine and business. (Jordan, 2014, p. 69).

But: Student opinion is important. (Dermo, 2009).

We need to consider student opinion, butwe also need to consider students’ actual actions.

25

Page 26: I wish I could believe you: the frustrating unreliability of some assessment research

Ethics

26

Page 27: I wish I could believe you: the frustrating unreliability of some assessment research

Ethics

Is it ethical to only give a helpful intervention to half the class?

Are we allowed to do experiments in Education?

27

Page 28: I wish I could believe you: the frustrating unreliability of some assessment research

Look at evidence-based medicine

How do you know it’s effective if you have not done the experiment?If you don't know whether it is effective, is it ethical to use it?

(They have been doing this for a while)

28

NICEAcademicresearchers

Drugcompanies

Doctors

Meta analysis

The literature

Medical schools

Page 29: I wish I could believe you: the frustrating unreliability of some assessment research

The end

29

Page 30: I wish I could believe you: the frustrating unreliability of some assessment research

References

Bolton, J., Jordan, R. & Jordan, S. (2015). An investigation into factors affecting physics' studentsengagement with online assessment, Manuscript in preparation.

Cohen, L., Manon, L. & Morrison, K. (2011). Research methods in education, 7th Edition, Routledge.

Dermo, J. (2009). e-Assessment and the student learning experience: A survey of student perceptions of e‐assessment. British Journal of Educational Technology, 40(2), 203–214.

Goldacre, B. (2008). Bad Science, Fourth Estate.

Goldacre, B. (2012). Bad Pharma, Fourth Estate.

Jordan, S. (2009). Assessment for learning: pushing the boundaries of computer-based assessment.Practitioner Research in Higher Education, 3(1), 11–19.

Jordan, S. (2011). Using interactive computer–based assessment to support beginning distance learners of science, Open Learning, 26(2), 147–164.

Jordan, S. (2014). E-assessment for learning? Exploring the potential of computer-marked assessment and computer-generated feedback, from short-answer questions to assessment analytics. PhD thesis. The Open University. At http://oro.open.ac.uk/41115/.

Karpicke, J. & Blunt, J. (2011). Retrieval practice produces more learning than elaborative studying withconcept mapping, Science, 331(6018) 772–775.

Marriott, P. (2009). Students' evaluation of the use of online summative assessment on an undergraduatefinancial accounting module. British Journal of Educational Technology, 40(2), 237–254.

Sly, L. (1999). Practice tests as formative assessment improve student performance on computer‐managed learning assessments, Assessment & Evaluation in Higher Education, 24(3), 339–343.

Vigen, T. (2014). Spurious Correlations, at http://www.tylervigen.com/spurious-correlations.

Weaver, M. R. (2006). Do students value feedback? Student perceptions of tutors’ written responses.Assessment & Evaluation in Higher Education, 31(3), 379–394.

Wikipedia (2015). Simpson's paradox, at https://en.wikipedia.org/wiki/Simpson%27s_paradox.

Wooldridge, C., Bugg, J., McDaniel, M. & Liu, Y. (2014). The testing effect with authentic educational materials:A cautionary note, Journal of Applied Research in Memory and Cognition, 3(3), 214–221. 30

Page 31: I wish I could believe you: the frustrating unreliability of some assessment research

Summary

Correlation vs causation

Confounding variables

Experiments – designed to minimise confounding variables

Don't abstract your experiment so muchthat the results aren't relevant

Student opinion and attitudes are importantbut different from actions or effectiveness

Ethical issues are real, but should be overcome

31

@tim_hunt [email protected]@SallyJordan9 [email protected]