I wish I could believe you: the frustrating unreliability of some assessment research

I wish I could believe you:the frustrating unreliability of some assessment research

Tim Hunt & Sally Jordan, The Open University@tim_hunt @SallyJordan9

Are these two things related?

Trick question (of course)

2

From a great web sitehttp://www.tylervigen.com/spurious-correlations

http://www.tylervigen.com/spurious-correlations


Correlation &Causation

3

Sly (1999)

614 students P01 S01 S02

Practice tests as formative assessment improve student performance on computer-managed learning assessment

4

A computerised assessment was quite exciting in itself back in 1999!

Questions picked at random from a bank.

P01 & S01 used the same test bank.S02 was different, with no practice.

Sly (1999)

614 students P01 S01 S02 609 students

417 62.18% 72.72% 66.88% 415

197 – 67.56% 62.24% 194


5

All standard deviations 15–17%

Sly (1999)

614 students P01 S01 S02 609 students

417 62.18% 72.72% 66.88% 415

197 – 67.56% 62.24% 194


6

All standard deviations 15–17%

+5.38%+5.16% +4.64%

OU level 3 physics (SM358)

An investigation into factors affecting physics’ students engagement with online assessment (Bolton & Jordan)

7


The assessment strategy

8

0 TMAs 1 TMA 2 TMAs 3 TMAs 4 TMAs

0 iCMAs

1 iCMA

2 iCMAs

3 iCMAs

4 iCMAs

5 iCMAs

6 iCMAs


Proportion of students

9


0 iCMAs 11.6% 3.4% 1.5% 0.5% 0.5%

1 iCMA 1.5% 1.0%

2 iCMAs 1.5% 2.4% 1.5%

3 iCMAs 1.5%

4 iCMAs 5.3% 2.4%

5 iCMAs 0.5% 3.9% 5.8% 8.2%

6 iCMAs 0.5% 0.5% 5.8% 5.8% 34.3%


Exam mark

10


0 iCMAs 6.0

1 iCMA

2 iCMAs 17.0 24.0

3 iCMAs 60.0

4 iCMAs 43.7 62.0

5 iCMAs 23.0 46.0 62.6 69.5

6 iCMAs 35.3 60.8 77.5


Exam mark compared to predictive model

11


0 iCMAs −20.8

1 iCMA

2 iCMAs −43.9 −27.5

3 iCMAs −9.0

4 iCMAs −15.6 +1.8

5 iCMAs −3.8 −11.1 +1.4 +2.4

6 iCMAs −17.1 +3.4 +4.6

Confoundingvariables

12

Berkeley gender bias case (1973)

Men Women

Applicants Admitted Applicants Admitted

Total 8442 44% 4321 35%

https://en.wikipedia.org/wiki/Simpson%27s_paradox#Berkeley_gender_bias_case

13

https://en.wikipedia.org/wiki/Simpson's_paradox#Berkeley_gender_bias_case




Berkeley gender bias case (1973)

Men Women

Department Applicants Admitted Applicants Admitted

A 825 62% 108 82%

B 560 63% 25 68%

C 325 37% 593 34%

D 417 33% 375 35%

E 191 28% 393 24%

F 272 6% 341 7%

https://en.wikipedia.org/wiki/Simpson%27s_paradox#Berkeley_gender_bias_case

14





RealExperiments

15

What is an experiment?

Split participants into two equal groups.

Split randomly, so if there are confounding variables,they are probably equally split between groups.

Give different ‘treatments’ to each group,trying to keep everything else the same.

Blind the treatment, if possible, to reduce all sorts of biases.

But, blinding is not normally possible in education.(You probably know if you just sat an exam!)

[Pick your favourite research methods book]

16

Karpicke & Blunt (2011) + many more

Retrieval practice produces more learning than elaborative studying with concept mapping

17

… but! Wooldridge et al (2014)

The testing effect with authentic educational materials:A cautionary note

18

“Based on [the testing effect], … some textbooks are now accompanied by quizzing ancillaries …The quizzes are designed with the assumption that answering factual and application questions will promote a more integrated mental model that incorporates the target knowledge.”

Typically, the quizzes and test banks sample items from similar sub-sections in the textbook but not necessarily the same information.

… but! Wooldridge et al (2014)

19

The testing effect with authentic educational materials:A cautionary note

How reliable isstudent opinion?

20

Background

Our own work with interactive computer-marked assignments (iCMAs)

21

Findings from a questionnaire

StatementDefinitely agree or

mostly agreeNeutral

Mostly or definitely disagree

Answering iCMA questions helps me to learn

129(85%)

7(5%)

12(8%)

If I get the answer to an iCMA question wrong, the computer-generated feedback is useful

128(85%)

11(7%)

8(5%)

Responses received from 151 students (response rate 20%)(Jordan, 2011)

22

Watching students in a usability lab

Six students observed answering questions (Jordan, 2009)

23

Data analysis

Much more data presented in Jordan (2014)

24

Reflection

Weaver (2006, p. 386) reports that 90% of students agreed with the statement “Positive comments have boosted my confidence.”

Marriott (2009, p. 243) reports that 93% of students agreed with the statement“I find the immediate reporting of my test result valuable.” It is almost certainly the case that more students report that they find feedback useful than actually make good use of it. This is in line with the bias in self-reported behaviour that is observed in medicine and business. (Jordan, 2014, p. 69).

But: Student opinion is important. (Dermo, 2009).

We need to consider student opinion, butwe also need to consider students’ actual actions.

25

Ethics

26

Ethics

Is it ethical to only give a helpful intervention to half the class?

Are we allowed to do experiments in Education?

27

Look at evidence-based medicine

How do you know it’s effective if you have not done the experiment?If you don't know whether it is effective, is it ethical to use it?

(They have been doing this for a while)

28

NICEAcademicresearchers

Drugcompanies

Doctors

Meta analysis

The literature

Medical schools

The end

29

References

Bolton, J., Jordan, R. & Jordan, S. (2015). An investigation into factors affecting physics' studentsengagement with online assessment, Manuscript in preparation.

Cohen, L., Manon, L. & Morrison, K. (2011). Research methods in education, 7th Edition, Routledge.

Dermo, J. (2009). e-Assessment and the student learning experience: A survey of student perceptions of e‐assessment. British Journal of Educational Technology, 40(2), 203–214.

Goldacre, B. (2008). Bad Science, Fourth Estate.

Goldacre, B. (2012). Bad Pharma, Fourth Estate.

Jordan, S. (2009). Assessment for learning: pushing the boundaries of computer-based assessment.Practitioner Research in Higher Education, 3(1), 11–19.

Jordan, S. (2011). Using interactive computer–based assessment to support beginning distance learners of science, Open Learning, 26(2), 147–164.

Jordan, S. (2014). E-assessment for learning? Exploring the potential of computer-marked assessment and computer-generated feedback, from short-answer questions to assessment analytics. PhD thesis. The Open University. At http://oro.open.ac.uk/41115/.

Karpicke, J. & Blunt, J. (2011). Retrieval practice produces more learning than elaborative studying withconcept mapping, Science, 331(6018) 772–775.

Marriott, P. (2009). Students' evaluation of the use of online summative assessment on an undergraduatefinancial accounting module. British Journal of Educational Technology, 40(2), 237–254.

Sly, L. (1999). Practice tests as formative assessment improve student performance on computer‐managed learning assessments, Assessment & Evaluation in Higher Education, 24(3), 339–343.

Vigen, T. (2014). Spurious Correlations, at http://www.tylervigen.com/spurious-correlations.

Weaver, M. R. (2006). Do students value feedback? Student perceptions of tutors’ written responses.Assessment & Evaluation in Higher Education, 31(3), 379–394.

Wikipedia (2015). Simpson's paradox, at https://en.wikipedia.org/wiki/Simpson%27s_paradox.

Wooldridge, C., Bugg, J., McDaniel, M. & Liu, Y. (2014). The testing effect with authentic educational materials:A cautionary note, Journal of Applied Research in Memory and Cognition, 3(3), 214–221. 30

http://oro.open.ac.uk/41115/

http://oro.open.ac.uk/41115/


https://en.wikipedia.org/wiki/Simpson's_paradox

Summary

Correlation vs causation

Confounding variables

Experiments – designed to minimise confounding variables

Don't abstract your experiment so muchthat the results aren't relevant

Student opinion and attitudes are importantbut different from actions or effectiveness

Ethical issues are real, but should be overcome

31

@tim_hunt [email protected]@SallyJordan9 [email protected]

https://twitter.com/tim_hunt

https://twitter.com/tim_hunt

mailto:[email protected]

https://twitter.com/SallyJordan9

mailto:[email protected]