Improvement of Learning Through Interactive Confidence ... · an innovative assessment tool based on the traditional Multiple-choice Question (MCQ) format that incorporates a method

Improvement of Learning Through

Interactive Confidence-based Assessment

by

Graham Farrell

BAppSci RMIT

Grad Dip Ed Hawthorn Institute

MIT SUT

A Thesis Submitted to

the Faculty of

Information and Communication Technologies

Swinburne University of Technology

for the degree of

Doctor of Philosophy

September 2010

ii

DECLARATION

This thesis contains no material which has been accepted for the award of any other degree or

diploma except where due reference is made in the text of the thesis. To the best of my

knowledge, this thesis contains no material previously published or written by another person

except where due reference is made in the text of the thesis.

Graham Farrell

September 2010

iii

ACKNOWLEDGEMENTS

I would like to offer my sincere appreciation to Dr Ying Leung for his direction and support

for the duration of my thesis, Professor Doug Grant for his assistance and Professor Yun

Yang for his assistance and encouragement in the final stages. I also wish to thank my

colleagues who have offered support along the way.

I would like to dedicate this work to Viv, Rebekah and Jai for all their encouragement over

the years. I would also like to dedicate this work to Ron Farrell for teaching me the value of

persistence and June Farrell for instilling confidence to take on challenges.

.

iv

ABSTRACT

Certain criteria need to be fulfilled for assessment to be considered of strategic value. This is

unfortunately not the case for many assessment strategies used today. The advent of new

technology in many cases has extended traditional assessment tools well beyond their intended

application, consequently falling short of their true goals of correctly grading the student while

supplying meaningful feedback to the learning process. The need to address the shortcomings of

traditional assessment strategies is necessary in order to improve the representation of a student’s

present level of knowledge. Educators generally concede the existence of these inherent

inadequacies with traditional assessment, such as the encouragement of guessing, failure to

recognize partial knowledge, miscalibration of confidence and the inability for a student to

declare minimal or no knowledge. Sound educational process is dependent on assessment

strategies, as they greatly contribute to the learning experience, both as a method of formally

assigning a grade (Summative Assessment) and as a means of giving feedback (Formative

Assessment). The value of good assessment is to encourage the instructor and student to reflect

on the results, often leading to adjustments in the student’s personal study program and

refinement of the curriculum by the instructor.

In considering previous research of others, this research promotes assessment with confidence

measurement as a method to address the inadequacies of traditional assessment strategies,

offering increased richness of feedback, elimination of the benefits gained from guessing and

encouraging the declaration of partial or no knowledge. This research then promotes the use of

an innovative assessment tool based on the traditional Multiple-choice Question (MCQ) format

that incorporates a method to measure the confidence of the student in their preferred answer/s,

referred to as the Multiple-choice Questions with Confidence Measurement (MCQCM).

The preliminary pilot programs identified some critical usability issues pertaining to its

operation, functionality and the operational cognitive process. The further development and

refinement of the MCQCM required consideration to HCI User Centred Design (UCD)

principles, formulating a set of heuristics specifically for assessment with confidence

measurement interactive systems. This research investigates the application of games taxonomy

to the educational arena to identify the criteria by which good interactive assessment tools should

conform and its application to the MCQCM.

v

This research identified the MCQCM to be equally reliable as other traditional assessment

options, producing a convergence of scores, confirming it as a valid method of summative

assessment. The observations and resulting analysis indicated that the greater distribution of

scores contributed to a more dispersed allocation of grades for the students.

The MCQCM utilizes technology to improve student learning in a progressive educational

climate that requires strategic assessment solutions. This research encourages other educators to

question current assessment practices and embrace the use of technology that is relevant to their

individual requirements.

vi

AUTHOR’S PUBLICATIONS

Journal Paper.

Farrell, G. & Leung, Y. (2004). Innovative Online Assessment. Education and Information

Technology. Journal of the IFIP Technical Committee on Education, 9(1), 5-20.

Conference Papers.

Farrell, G. Farrell, V., & Leung, Y. (2001). Online Software Test for Efficient and Effective

Assessment Using Multiple Choice Questions- An Evaluation. Paper presented at the American

Educational Research Association Conference Seattle, USA.

Farrell, G. & Leung, Y. (2002). Designing an Online Self-Assessment Tool Utilizing Confidence

Measurement. Paper presented at the Seeking Success in E-Business, IFIP 8.4 Working Group,

Copenhagen, Denmark.

Farrell, G. & Leung, Y. (2002). Improving the Design of an Online Self-Assessment Tool

Utilizing Confidence Measurement. Paper presented at the Web-Based Learning: Men and

Machines, Hong Kong.

Farrell, G. & Leung, Y. (2004). Comparison of Two Student Cohorts Utilizing Black Board CAA

with Different Assessment Content: A Lesson to be Learnt. Paper presented at the Computer

Assisted Assessment Conference Loughborough, England.

Farrell, G. & Leung, Y. (2005). A Comparison of Blackboard CAA and an Innovative Self-

Assessment Tool for Formative Assessment. Paper presented at the Computer Assisted

Assessment Conference, Loughborough, England.

vii

Farrell, G. & Leung, Y. (2006). A Comparison of an Innovative Assessment Tool Utilizing

Confidence Measurement to the Traditional Multiple Choice, Short Answer and Problem Solving

Questions. Paper presented at the Computer Assisted Assessment Conference, Loughborough,,

England.

Farrell, G. & Leung, Y. (2008). Convergence of Validity for the Results of a Summative

Assessment with Confidence Measurement and Traditional Assessment. Paper presented at the

Computer Assisted Assessment Conference, Loughborough, England.

viii

TABLE OF CONTENTS

Abstract......................................................................................................................................... iv

Author’s Publications .................................................................................................................. vi

List of Figures.............................................................................................................................. xv

List of Tables ............................................................................................................................. xvii

ETHICS APPROVALS............................................................................................................. xix

CHAPTER 1 Introduction ............................................................................................................ 1

1.1 Contribution of Assessment to Education ............................................................................ 2

1.2 Assessment Strategies ............................................................................................................. 2

1.3 The Advent of E-learning....................................................................................................... 3

1.4 The Criteria for Good Formative and Summative Assessment.......................................... 5

1.5 Concerns with Current Assessment Strategies .................................................................... 7

1.5.1 Assessment That Encourages Guessing................................................................................. 7

1.5.2 Assessment That Trains the Student to Do Well ................................................................... 7

1.5.3 Assessment Failing to Recognise Partial Knowledge and Miscalibration of Confidence..... 8

1.6 The Role of Computer Based Assessment in Addressing Issues of Assessment.............. 10

1.7 The Purpose of this Research............................................................................................... 11

1.8 Problem Statement................................................................................................................ 11

1.9 Scope and Aim of this Research .......................................................................................... 12

ix

1.9.1 Scope.................................................................................................................................... 13

1.9.2 Aims..................................................................................................................................... 13

1.10 Overview of Thesis.............................................................................................................. 14

CHAPTER 2 Research Design .................................................................................................... 17

2.1 Research Methodology ......................................................................................................... 18

2.1.1 Overview of Research Methodology ................................................................................... 18

2.1.2 Adopted Research Methodology.......................................................................................... 22

2.2 Research Design to Address Research Questions .............................................................. 22

2.3 Research Framework............................................................................................................ 25

2.4 HCI Approach to Problem Solving ..................................................................................... 28

2.5 Summary of this Research Structure.................................................................................. 29

CHAPTER 3 Variations Of Non-Conventional MCQ Assessment Strategies For Learning 30

3.1 Learning Theories and Learning Styles.............................................................................. 31

3.1.1 Learning Theories ................................................................................................................ 31

3.1.2 Learning Styles .................................................................................................................... 32

3.2 The Value of Feedback in the Learning Process................................................................ 34

3.3 Formative and Summative Assessment as Part of the Learning Path............................. 35

3.4 Assessment as a Means of Shifting the Responsibility of Learning to the Student ........ 38

3.5 Assessment Using New Technology..................................................................................... 39

3.6 Concerns with Computer Assisted Assessment.................................................................. 40

3.7 Assessment Options Available ............................................................................................. 41

3.8 Multiple-choice Questions .................................................................................................... 42

3.9 The Suitability of MCQ Tests to the New Technology ...................................................... 44

x

3.10 Previous Work on Innovative Approaches to MCQ Assessment ................................... 44

3.10.1 The Need for Innovative Scoring for Assessment ............................................................. 45

3.10.2 MCQs Designed to Eliminate Guessing ............................................................................ 46

3.10.3 Innovative MCQ Assessment with Confidence Measurement .......................................... 47

3.11 Interactivity in Learning .................................................................................................... 50

3.12 Contribution of Assessment with Confidence Measurement to Hede’s (2002) Model. 53

3.13 Assessment with Confidence Measurement as the Proposed Solution .......................... 56

3.14 Summary.............................................................................................................................. 57

CHAPTER 4 Scoring Options For Assessment With Confidence........................................... 59

4.1 Taxonomy of Scoring............................................................................................................ 61

4.2 Previous Scoring Methods to Address the Issue of Guessing ........................................... 62

4.3 Scoring Using Penalties for Incorrect Answers to Reduce the Impact of Guessing ....... 64

4.4 Comparison of an Incremental Balanced Scoring Method to Previous Work................ 82

4.5 Choice and Justification of Scoring Method for this Research ........................................ 90

4.6 Summary................................................................................................................................ 93

CHAPTER 5 Development Of The Multiple-choice Questions With Confidence

Measurement (MCQCM) Prototype And Pilot Program ....................................................... 94

5.1 The MCQCM ........................................................................................................................ 95

5.2 Design of the Rudimentary MCQCM Prototype ............................................................... 96

5.3 Pilot Studies ......................................................................................................................... 100

5.3.1 Aims of Pilot Studies ......................................................................................................... 100

5.3.2 First Pilot Study ................................................................................................................. 101

5.3.3 Second Pilot Study............................................................................................................. 105

xi

5.4 Discussion ............................................................................................................................ 116

5.5 Further Development of the MCQCM ............................................................................. 118

5.6 Summary.............................................................................................................................. 118

CHAPTER 6 Designing And Refining The MCQCM For Delivery Via The Web.............. 120

6.1 Games Taxonomy................................................................................................................ 122

6.2 Game Theory Relevance to Educational Games.............................................................. 123

6.2.1 Fundamental Game Theory Criteria .................................................................................. 124

6.2.2 The Goals and Rules of a Game ........................................................................................ 124

6.2.3 Game Fairness.................................................................................................................... 125

6.2.4 Games Risk and Rewards .................................................................................................. 125

6.2.5 Learning the Game Play..................................................................................................... 126

6.2.6 The Influence of Skill, Stress and Absolute Difficulty on Games..................................... 126

6.3 MCQCM Adherence to Game Play Topology.................................................................. 127

6.3.1 MCQCM Adherence to Playability Guidelines and Heuristics ......................................... 127

6.3.2 MCQCM’s Hierarchy of Challenges and Actions ............................................................. 128

6.3.3 MCQCM Learnability........................................................................................................ 129

6.3.4 Fairness of the MCQCM.................................................................................................... 129

6.3.5 MCQCM Stress Levels and Overall Level of Difficulty ................................................... 130

6.3.6 Summary of MCQCM Adherence to Game Play Topology.............................................. 130

6.4 Addressing Design and Usability Issues of MCQCM...................................................... 131

6.4.1 Addressing the Cognitive Load of the MCQCM............................................................... 134

6.4.2 HCI Evaluation of the MCQCM........................................................................................ 138

6.4.3 Heuristics Testing for Computer Aided Assessment (CAA)............................................. 138

xii

6.5 MCQCM Heuristic Evaluation Method ........................................................................... 141

6.5.1 MCQCM Redesign Resulting from Usability Heurisitics ................................................. 142

6.5.2 Grid Layout of Question Screen ........................................................................................ 142

6.5.3 Visibility of Student Progress During the MCQCM Test.................................................. 144

6.5.4 Minimisation of Errors and Error Prevention .................................................................... 145

6.5.5 Clear and Informative Feedback........................................................................................ 146

6.5.6 Summary of the Redesigning of the MCQCM Adhering to HCI Guidelines.................... 148

6.5.7 Heuristics for MCQ with Confidence Measurement ......................................................... 148

6.6 MCQCM’s Method of Handling Graphical Components............................................... 150

6.6.1 Previous Investigative Work on the Graphics Component of Interactive Assessment ..... 151

6.6.2 MCQCM’s Graphics Solution ........................................................................................... 153

6.7 Summary.............................................................................................................................. 158

CHAPTER 7 Comparison Of The MCQCM To A Traditional CaA Package For Formative

Assessment ................................................................................................................................. 160

7.1 Trial...................................................................................................................................... 161

7.2 Comparison of the MCQCM to a Traditional Computer Based Formative Assessment

Package ...................................................................................................................................... 162

7.2.1 Method ............................................................................................................................... 162

7.2.2 Results Analysis for Students ............................................................................................ 163

7.2.3 Instructor’s Focus Group for Formative Assessment ........................................................ 166

7.3 Concluding Observations of Comparison of MCQCM to Traditional Computer

Assessment ................................................................................................................................. 168

7.4 Summary.............................................................................................................................. 168

xiii

CHAPTER 8 Using The Web-based MCQCM For Summative Assessment ....................... 170

8.1 Initial Trials using MCQCM as a Summative Assessment Tool .................................... 171

8.1.1 Setting ................................................................................................................................ 171

8.1.2 Results................................................................................................................................ 172

8.1.3 Discussions and Conclusions............................................................................................. 176

8.2 Comparative Analysis of using the MCQCM as a Summative Assessment tool to the

Traditional Short Answer, MCQ and Long Answer Assessment ........................................ 178

8.2.1 Method of Comparative Study........................................................................................... 179

8.2.2 Results................................................................................................................................ 179


8.3 Comparative Analysis of using the MCQCM and Traditional MCQ as a Summative

Assessment Tool ........................................................................................................................ 182

8.3.1 Method ............................................................................................................................... 182

8.3.2 Results................................................................................................................................ 182


8.4 Instructor’s Focus Group for Formative Assessment ..................................................... 185

8.5 Discussion ............................................................................................................................ 186

8.6 Summary.............................................................................................................................. 187

CHAPTER 9 Summary, Conclusion And Future Work ........................................................ 189

9.1 Summary of the Research .................................................................................................. 190

9.2 Recapitulating on Previous Chapters................................................................................ 192

9.3 Discussion ............................................................................................................................ 197

9.3.1 MCQCM as a Valuable Formative Assessment Tool........................................................ 197

xiv

9.3.2 MCQCM as a Summative Assessment Tool ..................................................................... 200

9.4 Ethical Issues ....................................................................................................................... 203

9.5 Limitations of Study ........................................................................................................... 203

9.5.1 Scope.................................................................................................................................. 203

9.5.2 Internal Validity ................................................................................................................. 204

9.5.3 External Validity, Transferability ...................................................................................... 204

9.5.4 Construct Validity.............................................................................................................. 205

9.5.5 Ecological Validity ............................................................................................................ 206

9.6 Research Contribution ....................................................................................................... 206

9.6.1 Outcome 1: The MCQCM Tool......................................................................................... 206

9.6.2 Outcome 2: The Value of Assessment with Confidence Measurement for Formative

Assessment.................................................................................................................................. 208

9.6.3 Outcome 3: The Value of Assessment with Confidence Measurement for Summative

Assessment.................................................................................................................................. 210

9.6.4 Outcome 4: Heuristics for CAA with Confidence Measurment........................................ 212

9.6.5 Outcome 5: The Contribution of this Research to Educators Investigating Alternative

Assessment Strategies................................................................................................................. 213

9.7 Future Work........................................................................................................................ 215

9.8 Concluding Remarks .......................................................................................................... 217

AppendiX A: SURVEYS .......................................................................................................... 235

Appendix B: Simulation Result Displays................................................................................ 249

Appendix C: MCQCM Screen Presentations..................................................................... 251

xv

LIST OF FIGURES

Figure 3-1: Kolb's (1984) Learning Style Model ........................................................................ 33

Figure 3-2: Confidence Measuring Template, Paul (1994).. ........................................................ 48

Figure 3-3: Hede’s (2002) Integrated Model of Multimedia Effects on Learning ....................... 52

Figure 3-4: Relation of Assessment with Confidence Measurement to Hede’s (2002) Multimedia

Model. ........................................................................................................................................... 55

Figure 4-1: Paul’s CBAA Triangle with the Corresponding Score for Each Region.................. 75

Figure 4-2: CBAA Scores for C1, C2 &C3 ................................................................................. 79

Figure 4-3: Other Scoring Options used Including Scheme A from Hassmen & Hunt (1994) and

Schemes B-D from Davies (2005)................................................................................................ 80

Figure 4-4: MCQCM Scoring with Optimal Path......................................................................... 85

Figure 4-5: Graph Comparing the MCQCM and CBA Expected Scores..................................... 87

Figure 5-1: The MCQCM Prototype Developed to Run the Initial Trials.................................... 96

Figure 5-2: Scoring Calculator for MCQCM Table 5-3 ............................................................... 99

Figure 5-3: Age and Gender Distributions for Both Cohorts of Students .................................. 108

Figure 5-4: Frequency of the Under Graduates, Cohort 1, Scores for Each Question ............... 112

Figure 5-5: Frequency of the Postgraduates, Cohort 2, Scores for Each Question.. .................. 112

Figure 6-1: Slide Rule to Register Confidence .......................................................................... 135

Figure 6-2: First Fundamental Version of the Web Based MCQCM ........................................ 137

xvi

Figure 6-3:The Appearance of the Confidence Sliding Bar ....................................................... 137

Figure 6-4: Grid Layout of MCQCM ......................................................................................... 142

Figure 6-5: Grid Layout of the Functional Areas, Distinguished by Numbers .......................... 143

Figure 6-6: Question Display Showing 3 Navigational Supports............................................... 144

Figure 6-7: Support for User to Minimise Errors ....................................................................... 145

Figure 6-8: Final Dialogue Box to Support the User in Error Prevention .................................. 146

Figure 6-9: Feedback Screens: (A) Display for all Questions with Hyperlink to (B) Display of

Individual Questions ................................................................................................................... 147

Figure 6-10: MCQCM Dual Screen Display .............................................................................. 156

Figure 6-11: MCQCM Diagram as a Full Screen....................................................................... 157

Figure 6-12: Demonstration of MCQCM Display of Varied Screen Sizes ................................ 156

Figure 7-1: Graph of MCQ and MCQCM Scores for Cohort 1.................................................. 163

Figure 7-2: Graph of MCQ and MCQCM Scores for Cohort 2.................................................. 163

Figure 8.1: MCQ and MCQCM Scores for Each Student with the MCQ Clustered.................. 173

Figure 8-2: The Student’s MCQ (clustered ascending order) and MCQCM Scores.................. 183

xvii

LIST OF TABLES

Table 2-1: Research Questions Addressed by the Positivism Research Paradigm....................... 25

Table 2-2:Research Questions Addressed by the Interpretivism Research Paradigm.................. 27

Table 4-1: Possible Responses with Four Options, Pollard (1985) .............................................. 65

Table 4-2: Scoring Formulas for Responses Pollard (1985)......................................................... 66

Table 4-3: Pollard’s Two Solutions for k Values.......................................................................... 69

Table 4-4: Expected Scores for Random Guessing, Pollard (1985) ............................................. 69

Table 4-5: Example of Pollard’s Scores for Both Sets of Values of k.......................................... 70

Table 4-6:CBA Scoring System for Correct and Incorrect Answers............................................ 79

Table 4-7:Balanced Scoring Registered Confidence for Correct and Incorrect Answers ............ 84

Table 4-8: The Average Expected Scores from MCQCM and CBA............................................ 87

Table 5-1: Rules and Example of a Score for a Given Scenario .................................................. 97

Table 5-2: Resulting Score for Options Given the Student’s Choice and Their Registered Level

of Confidence................................................................................................................................ 98

Table 5-3: Example of a Question, which has 2 Correct Answers B and C ................................. 99

Table 5-4: Pilot Program Student and Instructor Observations .................................................. 104

Table 6-1: List of Sim et al. (2006) Heuristics for CAA ............................................................ 141

Table 6-2:List of Sim et al. (2006) Heuristics with Elaborated Heuristics for MCQ with

Confidence Measurement and Problems Addressed by Revised Heuristics .............................. 150

xviii

Table 7-1: Proportion of Postgraduate and Undergraduate Students and Proportion of Each

>25 Years of Age........................................................................................................................ 164

Table 7-2: Responses to the Questions of Student’s Perception of the MCQCM...................... 165

Table 7-3: Responses of Student’s Perception of the MCQCM vs BB ...................................... 166

Table 8-1: Average, Standard Deviation and Difference for Both Marking Schemes ............... 172

Table 8-2: The Correlation for the Two Marking Schemes........................................................ 174

Table 8-3: Means and Standard Deviations for Each of the Section of the Exam...................... 179

Table 8-4: Correlation Table for the Sections of the Exam........................................................ 180

Table 8-5: Correlation of MCQ with MCQCM.......................................................................... 184

Table 8-6: Chi-Square MCQ to MCQCM .................................................................................. 184

xix

ETHICS APPROVALS

“Innovative Online Assessment Using Confidence Measurement.“ (Chapter 5):

Extension Granted to School of IT Approval for “Online Self-Assessment Using Multiple-choice

Questions: An Innovative Approach.”, issued by the School of Information Technology,

Swinburne University of Technology 2000.

Approval Code: IT2000-04

1

CHAPTER 1 INTRODUCTION Education has its origins deeply rooted in our history, and in many instances, continues

to adhere to the fundamental core principles. It is the ongoing challenge of educators

to develop, trial and implement innovative educational approaches that facilitate the

changing demands of the learners, learning institutions and the general society.

Education finds itself positioned firmly within the context of the present being

influenced by societies expectations and the political environment. As a result,

educational institutions are often required to deliver their programs adhering to and

complying with governing legislation within budgets imposed by the government of the

day. Educators face many challenges when dealing with the dilemma of delivering

fundamentally traditional programs with new technology enhanced components.

Activities have culminated in the production of education tools designed to increase

educational productivity while providing extended services to more individuals for less

cost. The advent of the technology has necessitated research into the evolving concerns

and the development of best practice.

2

1.1 Contribution of Assessment to Education

Assessment is a major contributor to education as a means of providing both feedback

to the student (Formative assessment) during their learning experience and as a grading

mechanism (Summative assessment) reflecting their level of achievement. A

longstanding motivation for assessment is for the comparison of an individual’s

recorded achievement to others, current and previous. It is these comparative grades

that can be used to determine the vocational and further educational paths of the

participants. This primary objective of assigned level of achievement underpins the

contribution of assessment to education. Progressive work in the area of assessment is

considered to be of value, as the ability to improve the method of grading has perceived

benefits to all participants. Additionally, the opportunity to produce richer feedback to

both the student and the instructor during the learning experience is beneficial.

1.2 Assessment Strategies

The acquisition of knowledge is not linear (Hyerle, 2009), as students often approach

learning by moving freely through the educational matrix absorbing knowledge at

various times and events. Longino considers the growth of knowledge to be non-linear,

irregular, layered and patchy (Longino, 2002). It is the need for instructional material

to facilitate for the diversity of the learners, requiring differentiated instruction for their

multiple intelligences and habits of mind. Likewise assessment strategies must also be

provided to cater for the multidimensional aspect of acquired knowledge. Many of the

assessment strategies used today evolved from the long time practices that are linear in

design and fail to provide the complexity required to gauge the further dimensions of

knowledge. The dilemma faced by educators for years has been identifying the most

appropriate assessment regime to adopt that will accurately reflect the level of

knowledge of any given student. The present practice is to employ a combination of

assessment strategies that over time have proven to be reliable, such as multiple-choice

questions (MCQ), short essay answers, case studies and the like. Much discussion still

3

exists over the reliability of such testing mechanisms with educators continually

questioning their validity.

MCQ testing has long been a popular choice by academics for their ability to assess

large numbers of students in broad areas of the curriculum with relative ease (Tarrant,

Ware, & Mohammed, 2009). The ability of recycling questions from large readily

available banks is greatly appealing to the already over-burdened instructor. MCQs by

their design lend themselves to extension with the advent of technology evolving into

more complex levels of application. While this gives the opportunity for quicker

responses to both the student and the instructor there is often over-zealous

embracement of technologically advanced MCQ assessment strategies. This can often

result in the application of MCQ testing for purposes for which they were not truly

designed. To add to the concern of the extension of MCQ testing is the inability of

instructors to construct suitable MCQ questions, which has been an issue well before

the advent of the surrounding technology, and only serves to exasperate the situation.

The more critical and active educators endlessly pursue innovative alternatives in an

attempt to improve the assessment process. This research documents such a journey,

following the investigative path of an innovative approach to assessment that

incorporates confidence measurement. As a means of validation it compares the

outcomes produced from the implementation of such an innovative approach directly to

those from traditional assessment activities to ascertain their validity and reliability.

Importantly, it acknowledges that all of these comparisons are to existing assessment

strategies, classified as the benchmarks in traditional instructor driven delivery. It also

respects (Lederman & Niess, 2000) sentiments; that technology in learning should only

be introduced under the premise of a sound pedagogical reason.

1.3 The Advent of E-learning

Recently there has been a shift in our approach to education that can be partially

attributed to the impact that the Internet and its associated technology has on every

aspect of life. With the advent of Internet, e-learning with the Internet as a medium for

global 24 hours/7 days a week access has become an accepted method for delivery,

4

interaction and assessment. Research into innovative assessment requires examination

of the relevant elements of the e-learning environment and its important contribution to

the educational arena. The modern IT savvy student of today has embedded the

encompassing technology into all aspects of their life. Education is no exception, as

they have expectations of educational activities being available to them not only in the

classroom, but also in the library or remotely in their designated place of study via the

Internet. Interactive assessment is a significant participant, often used at different levels

of activity in the spectrum of educational delivery, where elements of the e-learning

paradigm are called upon, to various extents, to enrich the learning experience and offer

increased feedback and support.

This research acknowledges the willingness for educators to develop innovative

assessment strategies as a means of improving the learning experience and in particular

to consider new technology to improve these strategies. This research leverages off the

e-learning platform, however it is not wholly reliant on it for its operational existence.

The purist’s definition of e-learning incorporates all aspects of the practice, including

Internet based presence, computer managed learning (CML) applications, synchronous

and a-synchronous methods of electronic communication and delivery, online

assessment and many other contributors to the paradigm. In reality, e-learning is often a

hybrid approach, or “Blended Learning” (Keller, 2008), where aspects of the traditional

mode of education, such as lectures, tutorials and laboratories, are blended with online

supportive activities, such as online quizzes, streamed lectures and multimedia

presentations (Harris, Sadowski, & Birchman, 2006). In most cases, this hybrid

approach offers increased flexibility, enhanced learning environments and a richer

learning experience while still benefiting from the traditional modes of delivery. The

e-learning paradigm used by educational institutions often contains the full spectrum of

delivery modes, from the student studying via distance education solely to full on-

campus participation. E-learning activities within many institutions can be split into

two components. Firstly, the “learning” element that includes delivery modes in the

traditional format. Secondly, the “e” activities, where the more non-traditional

components of the educational process lie, as stated by Howlett et al. that “online

educational techniques can be effectively blended with other forms of teaching”

5

(Howlett et al., 2009). This research considers a non-traditional approach to education

greatly supported by elements of the e-learning paradigm, in particular the field of

assessment. While this research associates itself in the e-learning phenomenon, it has

its grounding in the historically, long tested and validated traditional delivery

paradigms.

1.4 The Criteria for Good Formative and Summative Assessment

Assessment plays a critical role in the educational process as both a means of grading

(summative) and supplying valuable feedback (formative) to the students. The

embracement of technology as a major contributor to the delivery of education has

increased the expectations for effective assessment systems to be available to the

student, encouraging self-assessment at all stages of the learning experience. It is

accepted wisdom that testing for the purpose of feedback should be an integral part of

the sequence of learning activities rather than an interruption to that sequence, as

discussed in the Principles and Standards for School Mathematics (National Council of

Teachers of Mathematics, 2000).

Traditionally, both formative and summative methods of assessment have been reliant

on the instructor to supply feedback. With the ever-increasing demand to restrain

delivery cost, resulting in increased student to staff ratios, unintentional delays in

supplying feedback to the student can occur to the disadvantage of both the student and

the instructor. As a result, often the most valuable feedback occurs at the final stages of

the learning path, generally too late to be of great value to the students’ learning

process and providing only limited feedback to the instructor. This situation tends to be

of primary benefit to the succeeding group of students participating in the next

scheduled delivery of the subject.

Good assessment practices need to meet a set of criteria to be considered of any true

value. Formative assessment is required to produce meaningful feedback to the benefit

to the student and the instructor. It is equally important that this feedback is timely,

contributing to the learning path of the student, and designed to underpin rather than

undermine student confidence (Torrance, 2007a). The feedback from formative

6

assessment should be easily comprehended by the student offering direction in their

learning by highlighting the areas of understanding, misunderstanding and complete

ignorance. Immediate feedback often enhances the value of a formative assessment

exercise, which is an inherent characteristic of an online assessment tool due to the

nature of the encompassing technology. The formative assessment feedback cycle

should occur early, and then constantly throughout the duration of the student’s

learning experience (Farrell & Leung, 2002a), encouraging self regulation by the

students (Nicol & Macfarlane-Dick, 2006). Formative assessment feedback identifies

the areas of concern encouraging the instructor to evaluate the method of delivery and

adjust the remaining instructional program as deemed appropriate. Banta, Jones and

Black (2009) state that without examination of the results, improvement of the learning

outcomes cannot occur (Banta, Jones, & Black, 2009). Summative assessment is to

assess the level of the student’s knowledge for any particular area. Summative

assessment requires being correctly scheduled, being appropriately timed to fit into the

program to be the most effective (Banta et al., 2009) and producing a set of results that

offer validity and reliability while being fair and ethical (Rice, Campbell, & Mousley,

2007). Rice, Campbell and Mousley (2007) are concerned that many educators still see

summative assessment as only a method of collecting information and comparing the

relative worth of different students. While acknowledging that an objective of

summative assessment is to produce a set of comparable results, Rice Campbell and

Mousley (2007) emphasise that it can encourage deeper understanding by merging

together assessment and the teaching process with quality teaching. Summative

assessment should also offer the same feedback as the formative assessment activities,

guiding the student to address the areas of concern as outlined above. Summative

assessment activities often occur during the delivery period and in some cases very

early, which enables the student to use the experience as a self-assessment exercise and

therefore should be designed with this in mind.

While many educators are willing to accept traditional assessment strategies, there are

those who doubt their true value and acknowledge their limitations. Generally it is

accepted that there is societal and academic need for assessment and hence the need for

assessment strategies that are reliable and manageable. Some assessment strategies fall

7

short of the criteria mentioned above. The following sections identify some of the

concerns of present assessment strategies that are relevant to this research.

1.5 Concerns with Current Assessment Strategies

In order to meet acceptable criteria for good assessment where student cohorts are large

and feedback is required to improve student learning, it is necessary to understand the

problems associated with current assessment strategies. The problems identified are

students guessing rather than knowing the correct response, students learning how to

respond to a specific type of assessment, assessment not recognising partial knowledge

or not recognising students have acquired incorrect knowledge.

1.5.1 Assessment That Encourages Guessing

Presently, educators often use assessment tasks that permit and encourage guessing as

part of the testing strategy, such as in the case of the standard MCQ tests. Systems that

permit and encourage guessing can in many instances overstate the student’s current

level of knowledge. This misconception of knowledge is a major contributor to the

inspiration of this research. Guessing the answer may not necessarily be in itself a

severe problem if it is the consequence of serious consideration by the student when

deliberating over the correct answer. The actual learning is reliant on the student being

informed of the correct answer to the question that they guessed as part of the post

analysis, hence turning the experience into a formative assessment exercise.

This study is not alone in considering penalties for incorrect answers designed to

eliminate gain from guessing and will discuss previously developed innovative

assessment strategies in Chapter 3.

1.5.2 Assessment That Trains the Student to Do Well

An additional concern is that the mode of testing may encourage the students to channel

their learning. Such a tendency is not restricted to newly introduced innovative

assessment strategies but has been inherent with most of the assessment mechanisms to

8

date. A student faced with an MCQ test will often set their study routine to address the

lower levels of knowledge, as described by Bloom’s Taxonomy of Educational

Objectives (Bloom & Krathwohl, 1956) and as presented by Marshall and Carson

(2008) as rote learning formulas and terminologies (Marshall & Carson, 2008). In those

cases the student’s learning is opposed to the efforts needed to answer questions based

on case studies at the higher Bloom’s (1956) levels of application and synthesis. While

we tend to be more forgiving of these shortcomings for the afore-mentioned traditional

assessment strategies, there is a tendency for recently introduced innovative assessment

applications to be over-scrutinized, attracting harsh criticism for the same offences.

During this research, focus groups of assessment practitioners (Farrell & Leung, 2008)

discussed the merit of incorporating penalties for wrong answers. The educators who

presented this innovative approach attracted negative responses from some academics,

being accused of introducing assessment systems that effectively trained the student in

achieving the highest grade. Crocker (2005) considers it harmful to educational

development to give instruction with the purpose of making students skillful test takers

(Crocker, 2005). In recognition of this being considered as a bias, Astin (1991)

recommends that all existing assessment be scrutinised when new assessment strategies

are introduced (Astin, 1991).

1.5.3 Assessment Failing to Recognise Partial Knowledge and Miscalibration of

Confidence

Traditional testing methods, such as MCQs, generally offer an adequate method of

assessing student’s knowledge in areas deemed as right or wrong, however they fail to

cater for the “shades of gray” or fuzzier areas of knowledge. The data gained from

assessment activities that permit the registration of partial knowledge may permit the

understanding of misconceptions or gaps in the student’s knowledge.

Davidoff (1995) recognises this need to identify partial knowledge in the medical arena,

promoting a more thorough approach in assessing students (Davidoff, 1995). He

suggests that a system designed to recognize incomplete or partial knowledge, also

suggested by Ben-Simon, Budescu and Nevo (1997) permitting the student to hedge the

answer, would be greatly beneficial to the learning process (Ben-Simon, Budescu, &

9

Nevo, 1997). They further state that a great percentage of medical knowledge is

incomplete, ambiguous and conflicting and therefore the standard MCQ testing method

does not facilitate or reflect the students’ level of knowledge along with their

confidence in that level of knowledge. In support of Davidoff (1995), Ng and Chan

(2009) purport that there is a need to allow for the student who knows only part of the

answer and that this is not the case with conventional MCQ testing that fails to capture

this partial knowledge (Davidoff, 1995; Ng & Chan, 2009). As previously discussed

MCQs were not originally designed to return such complex analysis and their extended

use accredited to the advent of the new technology places them in a failing position

within this task. Ultimately the effectiveness of MCQ format questions is reliant on the

construction of the questions. If partial knowledge is to be acknowledged the test

questions are required to be constructed in the context of the situation being assessed,

as for a medical emergency, where various correct answers are valid depending on the

situation in which they are asked. Davidoff’s (1995) main criticisms of MCQs is that

they only recognize and reward those areas of knowledge that are either right or wrong,

encouraging guessing and often leading to over-confidence. He considers miscalibrated

confidence in medical education equally as concerning as lack of knowledge. Clark and

Friesen (2009) consider systematic over-confidence by the individuals on the

economics field could have important consequences and Acker and Duck (2008) have

identified that propensities to over-confidence are dependant on cultural background

(Acker & Duck, 2008; J. Clark & Friesen, 2009).

Davidoff’s (1995) concerns of failing to permit students to demonstrate partial

knowledge and the issue of miscalibration of confidence is not confined to the medical

arena, as most educational disciplines expect their students to demonstrate absolute

knowledge for certain aspects of the curriculum and levels of knowledge and general

understanding for other areas (Davidoff, 1995). The ability to recognise a particular

path for further investigation is dependent on the investigator having a level of

confidence that there is a need for investigation. Miscalibration of confidence can have

equally a devastating effect on an engineering problem pertaining to the construction of

a pedestrian bridge as it has to the diagnosis of an individual with a medical condition.

10

Diamond and Forrestor (1983) define knowledge as asking the question “What do you

know?” followed by the meta-question “How sure are you of the answer to the question

about what you know?” in their attempt to address the issue of effective assessment

(Diamond & Forrester, 1983). They consider the registration of confidence to be a

significant indication of a true level of knowledge.

Chapter 3 extensively investigates the supportive evidence of the concerns outlined

above, along with Doebbert’s (1999) recognition of the need to place the management

of the learning into a student’s own hands, which would often culminate in a far deeper

understanding and learning (Doebbert, 1999).

1.6 The Role of Computer Based Assessment in Addressing Issues of

Assessment

With the introduction and acceptance of the Internet by a large proportion of the

developed world it is not surprising that education is a major beneficiary. The

opportunity to use this emerging technology as a major contributor to education has

been pursued by both corporate environments and educational institutions, yielding a

large number of Web-centric supportive tools and materials designed to complement,

supplement and as a substitute to existing learning materials in traditional format. The

inherent features of the encompassing technology, such as the ability to instantly

process and respond, has produced a myriad of educational supportive tools, numerous

Computer Managed Learning (CLM) applications and other online resources to

enhance the learning experiences of students today. Hence, e-learning has come of age,

contributing significantly to education, with a large number of educational institutions

utilising many of its functions.

As previously stated, the benefits of e-learning tools are not confined to the e-learning

domain alone. Many of the innovative approaches attributed to this paradigm have

comfortably found their way into the more traditional educational arena, to the extent

where many subjects offered by educational institutions, independent on mode of

delivery, contain a supportive Web presence using components of the Web paradigm,

discussion boards, computer aided testing and the like. The increase in flexibility for

11

the busy corporate-based student and the 24 hours/7 days a week access to learning

materials is an expectation of the net-centric participant of today.

Consequently, there is a need for reliable computer based assessment to be available as

an active part of the e-learning platform in support of the educational process. In

particular, the availability of online formative assessment tools to meet both the needs

of the student and instructor are required.

To be of significant benefit to the student and the instructor a computer based

assessment task must not only be easily accessible, convenient to the user and offer

timely effective feedback, it must also be capable of reflecting the participant’s level of

knowledge as accurately as possible.

To be fully accepted by the educational fraternity the resulting student’s grades must be

at least comparable to existing, validated assessment systems and offer sound

educational values.

1.7 The Purpose of this Research

This research postulates that there is the need to design, test and implement suitable

online formative and summative assessment tools to enhance the students learning

experience for the duration of their studies. The rapid adoption of the online

educational utilities by both academic institutions and corporate entities has catapulted

both the students and instructors into the e-learning paradigm. To meet the demand of

e-learning assessment the need to design, trial and implement innovative, formative and

summative assessment strategies to provide timely feedback and accurately gauge the

student’s level of knowledge within the context of the environment is vital.

1.8 Problem Statement

In light of the previous discussion the impact and contribution of the e-learning

paradigm in all aspects of educational delivery has been clearly identified and

acknowledged. Assessment plays an integral component of educational delivery. The

development of new innovative approaches to assist in the e-assessment process is

12

greatly appreciated by the instructors faced with the challenge of accurately grading

their students. This research pursues one such assessment option with vigor, working

on the premise of the ongoing need for improved assessment strategies to be

investigated, as formulated in the problem statement below.

“Present educational assessment strategies often fail to provide an accurate representation of students’ knowledge, which is detrimental to both the student and instructor.”

1.9 Scope and Aim of this Research

The supply of quality educational material offering maximum convenience to the

student is often seen to be beneficial to both the students and the instructors. In view of

the importance of e-learning enhancements to traditional education, and the

fundamental role played by assessment in the learning process, it is essential that we

explore technologies and approaches that will improve assessment effectiveness for all

delivery modes, spanning the full spectrum of educational delivery, from fully online to

face-to-face delivery. Flexibility and portability are fast becoming requirements of a

successful educational program. For the individual wishing to further his/her existing

qualifications or for those who want a change in career path, it is often the case that

learners find themselves unable to participate fully in the traditional mode of delivery,

and will pursue options that best fit their busy work schedule.

In some instances the e-learning components of education remove the personal

interactivity within the classroom environment, hence creating a challenge to the e-

learning paradigm to enhance the online learning experience by incorporating rich,

personalized and timely feedback to the individual student. In addition, the instructors

are reliant on feedback in relation to the student’s knowledge acquisition over a period

of time, given that they do not always have the advantage delivering face-to-face.

This research considers a solution to the problem statement as stated in 1.8. This

research is limited and defined by the following scope and aims.

13

1.9.1 Scope

The scope of this research is in the use of an e-learning assessment tool based on the

traditional MCQ format for formative and summative assessment purposes for a variety

of educational settings, ranging from fully distance online to face-to-face modes of

delivery. The developed assessment tool is Web-based, designed to be delivered via the

Internet with 24 hours/7 days a week accessibility. This study has been limited to post

secondary educational as part of the delivery program. This research evaluates the

proposed innovative assessment system to ascertain if it is reliable as both a formative

and summative assessment tool.

1.9.2 Aims

The aims of this research is to:

“Investigate the ability of assessment with confidence measurement to increase the accuracy of representing a student’s level of knowledge and whether it is acceptable as a valid assessment alternative by students and instructors for both formative and summative assessment” This research hypothesised that the outcomes of using assessment with confidence

measurement is of benefit to both the instructor and the student. Firstly, enabling the

student to have an honest self appraisal of their knowledge of the content being

assessed, highlighting the areas of concern, which in turn will assist them in their

direction of learning. Secondly, enabling the instructor to ascertain the knowledge of

the individual and/or the group as a whole that will assist them in determining the best

learning path to address the content that is not being truly understood.

This research follows the path of designing, implementing, testing and refining the

online assessment tool Multiple-choice Questions with Confidence Measurement

(MCQCM) (Farrell, Farrell, & Leung, 2001; Farrell & Leung, 2002a, 2002b; Farrell &

Leung, 2004b; Farrell & Leung, 2006, 2008). The assessment application incorporates a

confidence measurement component proposed as a means of increasing the accuracy of

the student’s level of knowledge. Primarily the MCQCM was designed as a formative

assessment tool to assist the students in reflection and encouraging self-assessment,

14

often leading to a deeper understanding of the material being taught. In the later stages

of the research cycle preliminary investigations occur to compare MCQCM current

methods of assessment when used for summative assessment. This research argues that

if used appropriately assessment with confidence measurement will offer both grading

and informative feedback to assist the student along their learning path, all of which are

critical components for the success of the learning.

1.10 Overview of Thesis

Chapter 2 will present the adopted research methodology and framework, initially

identifying the alternative approaches available and then the decisional criteria for the

choice. Chapter 2 uses the identified problem statement (section 1.8), formulating the

corresponding research questions and sub-questions, mapping them against the most

appropriate research method for addressing them. It will then consider the Human

Computer Interaction (HCI) User Centred Design (UCD) iterative approach (Hussain et

al., 2008; Righi & James, 2007) to problem solving given that the problem space of this

research is firmly planted in the real world.

The literature review in Chapter 3 initially discusses the role of assessment as part of

the educational process, introducing various assessment strategies presently used and

their consequential impact and influence on the learning path of the participant.

Importantly, it highlights the value of feedback to students and the significant

contribution that it plays in the learning process. Chapter 3 also investigates the work of

those who have undertaken rigorous research before and during this study, citing

previous work where innovative approaches to assessment strategies have been

incorporated into programs with varying levels of success. Chapter 3 proposes that an

assessment with confidence measurement strategy be adopted to address the issues

identified incorporating a balanced scoring technique of rewards and penalties. The

interactive assessment tool is based on the traditional MCQ format of a stem (question)

followed by a set number of optional answers, referred to as the Multiple-choice

Question with Confidence Measurement (MCQCM) (McCoubrie, 2004). A detailed

discussion follows considering fundamental learning theory underpinning the process

15

of student learning, in particular the role of intrinsic motivational factors that often

contribute to deeper understanding. Chapter 3 then acknowledges the important

contribution of Learning Styles and the need for the assessment activities to support the

four phases of learning. Chapter 3 then aligns the contribution of assessment with

confidence measurement to Hede’s (2002) Integrated Model of Multimedia Effects on

Learning (Hede, 2002).

Chapter 4 extensively discusses alternative scoring techniques investigated by other

researchers in the field, demonstrating the mathematics supporting their proposed

solutions. Chapter 4 systematically reveals equations based on probability theory which

culminate in the establishment of scoring regimes providing choice for the participant

that are properly motivating. It considers the expected values of the users when

interacting with the various systems and then completes a comparative analysis.

Chapter 4 closes with the recommendation of the use of a balanced scoring system as a

compromise and enhancement of previous methods offered.

Chapter 5 presents the findings from two pilot programs in which the MCQCM

prototype underwent some preliminary testing. On analysis of the outcomes, Chapter 5

identifies the shortcomings of the proposed assessment tool and discusses the need to

address these before proceeding.

Chapter 6 discusses the refinement of the MCQCM tool used in Chapter 5 as a solution

to the stated problem and considers heuristic evaluation and the handling of graphics

and programming scripts. Furthermore, Chapter 6 considers the topology of the game

play phenomenon (Adams & Rollings, 2007), relating the MCQCM elements of game

play in its design and structure.

Chapter 7 initially reports on the results of simulations designed to evaluate the

functionality and effectiveness of the MCQCM tool before implementation to a large

group. Chapter 7 then reports on an investigation into the students’ and instructors’

perceived value of the MCQCM as a formative assessment tool, discussing the

outcomes and possible enhancements required for an effective assessment strategy.

Chapter 8 reports on the results of implementing the MCQCM for summative

assessment on three separate occasions to ascertain its validity, reliability and

convergence to other traditional assessment methods. Initially Chapter 8 reports on a

16

study in which the students’ tests results were analysed to ascertain if the MCQCM and

the traditional MCQ scores converged. Additionally, this activity was designed to

gauge the students’ and instructors’ perception of using the MCQCM for summative

assessment. The next case study in Chapter 8 analyses the student scores for four

assessment strategies, MCQCM, Short Answers, Problem Solving and traditional

MCQ, to determine the validity and reliability of the MCQCM and its convergence to

the other methods of assessment scores. Chapter 8 then analyses the MCQ and

MCQCM scores for a cohort of 85 students to verify its use as an alternative

assessment option to be included in the suite of assessment strategies available.

Chapter 9 offers the conclusions and discussion, where the observations are drawn

together. It further identifies the limitations and the internal, external, construct and

ecological validity of this research. Chapter 9 answers the formulated research

questions and sub-questions and identifies a contribution of this research to the

educational arena for formative and summative assessment strategies. Chapter 9

continues with a series of recommendations for educators wishing to pursue alternative

assessment in the future, particularly assessment with confidence measurement.

Chapter 9 concludes with a summary of the findings and challenges faced when

embarking on investigating alternative assessment, with discussion on the advantages

and disadvantages of alternative scoring techniques available, and suggested approach

for the future development of the MCQCM.

17

CHAPTER 2 RESEARCH DESIGN

This Chapter identifies and argues for the chosen research methodology by initially

considering the various research paradigms, then formulates the research questions

and the corresponding supportive research sub-questions. The research framework is

then developed in which this research addresses the questions previously identified. A

discussion on the Human Computer Interaction (HCI) User Centred Design iterative

approach to problem solving (Isacker, Slegers, Gemou, & Bekiaris, 2009) then follows.

18

2.1 Research Methodology

When embarking on a research project the researcher is mindful of the three dominant

research paradigms, positivism, interpretivism and critical theory (or critical science)

(Cohen, Manion, & Morrison, 2007). These proven approaches have, since their first

conception, influenced the path of researchers and will continue to do so in the future.

Every researcher embarking on their research path is required to choose an appropriate

research method to assist them. It is this nominated research method that determines the

structure of the research and the framework to which the researcher will adhere. Whilst

this research does not intend to espouse extensive discussion on the above-mentioned

paradigms, it is considered necessary to briefly discuss the meanings of each in order to

justify the research approach adopted.

2.1.1 Overview of Research Methodology

The origins of positivism date back to the early 18th century, being mainly attributed to

the works of French philosopher Auguste Comte (1798-1857), and are deeply rooted in

the works of other great philosophers who followed. Comte published his theories as

the Cours de philosophie positive (1830–1842) (Comte, 1868). Consequently,

Positivism shaped the intellectual discourse of the late nineteenth century, having

grown from Comte’s absolute rejection of value judgments when observing social

science. Comte concluded that human thought had passed from the theological stage

into a metaphysical stage and was entering into what he termed as the positive stage

(confining itself to what is positively given, avoiding all speculation) or scientific stage

(Corveleyn & Luyten, 2006). Comte postulates that the positivist researcher of the

scientific method only concerned themselves with the observable and the encompassing

relationships. Here a critical underlying assumption is that there exists a basic

knowledge concerning human behaviour and all worldly phenomenon. It anchors its

existence on the premise that if an hypothesis or proposal cannot be tested empirically

it cannot be proven fact. There is no place for value statements in positivism, only

statements that can be scientifically proven. For positivism to hold credibility it is a

19

requirement that all statements being tested for validity must be grounded in

observation, and these observations must be repeatable (Johnson, Buehring, Cassell, &

Symon, 2006). In addition, any experiments undertaken during the research process

should use the techniques agreed and endorsed by the entire scientific community. The

positivism research activities, such as controlled experiments, removes the subjectivism

from the study and generates quantitative data to be analysed and statistically tested,

providing statistical confirmation in support or rejection of proposed hypotheses

(Steinmetz, 2007).

The latter conceived post-positivism takes a more moderate position. It acknowledges

the existence of subjectivism as the result of judgments made by the researcher in the

study. These inherent judgments can occur when the researcher chooses the subjects for

experiments and maps out the research path with their own preferred methods,

imposing their own value judgments and influences during the research process. The

supporters of this research paradigm consider the effects of this interference to be

minimal if the research process is correctly applied. Read claims that post-positivism is

to reorganize social research with a new approach, unhampered by the earlier

experiences (Reed, 2008; Steinmetz, 2007).

In direct comparison, the supporters of interpretivism propose that all knowledge is

socially constructed (Hogg & Maclaran, 2008). This is referred to as a constructivist

view of knowledge, and assumes that absolute knowledge does not exist and that, in the

majority, most knowledge is reliant and built upon previous knowledge (Stavropoulos,

2007). It is acknowledged that interpretivism can be an acceptable research

methodology in some contexts while being rejected in other scientific methods, as

described in the previous Positivism discussion. Gerring (2003a, 2003b) refers to

interpretivism as interpreting or clarifying, where the construction of truth relies on the

tests of coherence rather than (or in addition to) correspondence with external reality

(Gerring, 2003a, 2003b). This paradigm directly questions the validity of the positivism

paradigm in that it does not support the concept of objectivity. Rather it considers that

the influence and subjectivity of the researcher primarily directs the path of the study.

The true value of interpretivsm is fully appreciated when applied to research in a social

context, the human social phenomena which consider the feelings, values and

20

interactions of the participants (Gerring, 2007). Cassell and Nadin (2008) consider “the

adoption of interprevist approaches has much to offer in terms of theoretical and

methodological development” in the field of entrepreneurship (Cassell & Nadin, 2008).

The interpretivist is heavily reliant on questionnaires and surveys to elicit their

findings, which generally culminates in both qualitative and quantitative data. It is this

qualitative data that must be interpreted in context as opposed to measured for

statistical significance. It is of particular interest to this study as it considers the

individual’s behaviour in the educational field and not that of the generalised

population.

Critical theory, as social philosophy, was born in the German Frankfurt School in the

1930’s, based fundamentally on the work of Marx (Marx, 1884). It conjectures that all

knowledge is historical and biased, and that objective knowledge is illusory.

Importantly critical theory acknowledges that power leads to distorted communication

and by becoming aware of the ideologies that dominate in society, groups can

themselves be empowered to transform society (Fuchs & Sandoval, 2008). This

paradigm often contributes to the changes in social structure. The critical theory

paradigm attempts to address the power imbalances during research, permitting those

not traditionally in charge, such as the participants, to influence the research direction

and hence is generally employed for social research in areas such as human rights

(Ackerly, 2004). The research techniques employed here for critical theory are often the

same as those used by interpretivism, being the tools for qualitative data generation,

such as surveys, ethnographic studies and case studies. This technique places the

researcher only as an equal peer in the process, and encourages the researcher to be

actively involved in the problem situation.

The research paradigms above are acknowledged and highly regarded, depending on

your position in the research fraternity. It is common knowledge that these paradigms

underpin and have shaped most of the research to date, though some research

disciplines experience difficultly finding the appropriate positioning. These research

paradigms bring with them significant contributions to research and even though

having various degrees of mutual exclusivity (positivism versus interpretivism) and

synergies (interpretivism and critical theory) they can often offer the opportunity to

21

work cohesively in many applications. Each has its strengths and weaknesses; and each

has a role to play when faced with the complexity of a real world problem space. The

strength of positivism lies with its ability to capture and analyse objective data,

especially in confined, controlled environments (Giddings & Grant, 2007). However, in

many situations it is not the case that such data exists, as it is an artificial representation

to the real world in which the problem resides. While case studies, ethnographic studies

and the like give strength to interpretivism and critical theory, application of these

research paradigms can be considered to be lacking in mathematical rigor and

reliability.

No matter which research framework is adopted it is the research methodology that

determines the activities to be undertaken during the study. Within this particular

research there is a requirement of a positivist approach for elicitation of quantitative

data to primarily gauge the effectiveness and contribution of assessment with

confidence measurement to the broader population of students. For this reason this

study uses experiments and field studies to gather data in an attempt to represent the

trends and influences on the group, to be tested for validity and reliability. Accordingly,

it also seeks to identify the social significance that the system has on the individual by

encouraging them to register their feelings and values when interacting with the online

assessment with confidence measurement during their learning, aligning itself strongly

to the interpretivism paradigms.

It is critical that a clear direction of the research framework and method be determined

in the early stage to ensure that the activities are designed to meet the research

objectives and contribute to addressing the identified research question(s) (Mansell,

2009). Research activity produces data, both quantitative and qualitative (empirical),

which must be correctly interpreted in order to support the hypotheses and propositions

formed and tested during the research lifecycle. Quantitative and qualitative data have

equally important roles to play in the research arena and are seen as important

contributors in the research process. A well planned research path is required to ensure

that the outcomes of the various activities and the types of data produced are relevant

and of value to the research.

22

2.1.2 Adopted Research Methodology

This research uses a combination of both positivism and interpretivism paradigms, as it

deals with individuals interacting with an interactive assessment tool as a means of

securing grades as well as expressing states of emotion (McIlveen, 2007). It cannot

ignore the influences of the emotional state of the participant. Using interactive systems

to capture quantitative data is subject to the state of mind of the participants, that is to

say that recent positive or negative experiences might directly affect their level of self-

confidence and their propensity towards registering their confidence level.

Consequently, the quantitative data is of two kinds. Firstly, it has the unequivocal

scores derived from their actual correct choices during the test, which can be analysed

using the positivism research approach. Secondly, the participants then supply

subjective judgments about their level of confidence, hence an interpretivism approach.

There is a need to adopt a research methodology that employs both positivism and

interpretivism research paradigms, to be well balanced and managed. The use of

positivism and interpretivism within this study is identified in Tables 2.1 and 2.2 latter

in this chapter.

2.2 Research Design to Address Research Questions

In order to guide this research towards its aims and consequentially provide a solution

to the stated problem a series of questions have been formulated. These questions form

a progression of understanding of the value of the assessment with confidence

measurement to students and instructors and its validity as an assessment strategy.

The problem statement is reiterated here, being:

“Present educational assessment strategies often fail to provide an accurate representation of students’ knowledge, which is detrimental to both the student and instructor.”

23

The consequential supporting aim of the problem solving exercise is to “Investigate the ability of assessment with confidence measurement to

increase the accuracy of representing a student’s level of knowledge and whether it is acceptable as a valid assessment alternative by students and instructors for both formative and summative assessment.” As previously highlighted, this research focused on the use of assessment with

confidence measurement for formative and summative assessment, giving rise to the

following main research question.

Main Research Question:

“Does assessment with confidence measurement increase the accuracy of

representing a student’s level of knowledge for formative and summative assessment

application?”

This gives rise to two further research questions. The first is formulated to ascertain if

assessment with confidence measurement could be used for formative assessment and

is as follows:

Research Q1.

“Does Assessment with Confidence Measurement produce more meaningful

feedback and influence the learning path when used for formative

assessment?”

In support of the research question pertaining to formative assessment the following

sub-questions have been formulated.

Q1A: “What are the students’ and instructors’ attitudes and perceptions of assessment

with confidence measurement when used for formative assessment?”

Q1B: “How do the students’ results compare to the results of a standard Multiple

Choice Question (MCQ) test when using assessment with confidence

measurement for formative assessment?”

Q1C: “Does the use of assessment with confidence measurement provide additional

valuable feedback to the instructor when used for formative assessment?”

24

The second research question is formulated to ascertain if assessment with confidence

measurement could be used for summative assessment that is as follows:

Research Q2.

“Does Assessment with Confidence Measurement offer at least equivalent

Validity and Reliability compared to traditional assessment strategies when

used for Summative assessment?”

In support of the research question pertaining to summative assessment the following

sub-questions have been formulated.

Q2A: ”What are the student’s and instructor’s attitudes and perceptions of assessment

with confidence measurement when used for summative assessment?”

Q2B: “How do the results compare in Validity and Reliability to the results of the

standard MCQ test when using assessment with confidence measurement for

summative assessment?”

Q2C: “How do the results when using assessment with confidence measurement for

summative assessment compare in Validity and Reliability to other traditional

methods of summative assessment?”

Q2D: “Does the use of assessment with confidence measurement provide additional

valuable feedback to the instructor when used for summative assessment?”

Q1C and Q2D recognise an important component of this research, the value of

assessment with confidence measurement to the instructors, as the direction of the

learning can be amended by them depending on the feedback received. Often, the

instructor will vary the instructional material, readdressing concepts and changing the

emphasis on the material as required, to increase the student’s understanding.

Consequentially, enriched feedback has a direct influence on the learning path provided

by the instructor and needs to be considered when formulating the research questions.

In addition to the research questions formulated above there is also the need to consider

the usability and consequential design requirements when producing interactive

assessment strategies for implementation. This gives rise to the third research question:

25

Research Q3:

“What are the design requirements for developing an interactive assessment

with confidence measurement to ensure that instructors and students are able

to achieve maximum benefit from its application?”

While the direction of this research has been determined by these research questions

and sub questions, the drive of this research is to ascertain the value of an assessment

strategy that uses confidence measurement primarily as a formative assessment tool,

then later, as a summative assessment tool.

2.3 Research Framework

Consistent with the positivism paradigm there will be instances where the research

activity will be designed to capture quantitative data for statistical analysis through

experiments, statistical analysis of validity, reliability and convergence and surveys to

ascertain the effect of an assessment with confidence tool to the general population of

participants. In particular the following sub questions identified in Table 2-1 will be

predominately addressed using this approach.

Sub Questions addressed using Positivism Research Paradigm

Q1B: “How do the student’s results compare to the results of a standard

Multiple-choice Question (MCQ) test when using assessment with

confidence measurement for formative assessment?”

Q2B: “How do the results compare in Validity and Reliability to the results of

the standard MCQ test when using assessment with confidence

measurement for summative assessment?”

Q2C: “How do the results when using assessment with confidence

measurement for summative assessment compare in Validity and

Reliability to other traditional methods of summative assessment?”

Table 2-1: Research Questions Addressed by the Positivism Research Paradigm

26

To address the sub questions identified in Table 2-1 a particular research approach

and appropriate data analysis was required. It was deemed suitable that the method of

research activities designed would be of the experimental type, where the activities

occurred during the course of the subject delivery. Following these activities the

collected data was statistically analysed for comparison to various more traditional

assessment tasks.

In comparison, this study also uses the interpretivism paradigm as outlined in section

2.1. Apart from some initial exercises confined to the laboratories in the early

development stages most of the activities supporting the research occurred in the real

world of teaching, during the tutorials and as part of the revision offered to the

students. These later investigations were dependent on the interaction of the

participants as part of their daily activities, both in class and at home, as a scheduled

part of the curriculum. As a result much of the data collected was captured during

dialogues that exist between instructor and student. In many cases the recorded

feedback was as a result of the subject’s appraisal system run as part of the university

course quality management process, where the students are given the opportunity to

comment on the assessment mechanisms used during the semester. These survey

results produce rich qualitative data representing the feelings, values, perceptions,

cultural and social attitudes of the individual, which require interpretation and

classification. In such cases this analysis assists in understanding the impact of the

research on the smaller sub groups of the cohort and not the population as a whole. It

is this opportunity to analyse the way the individuals learn and react that provides a

deeper understanding of the impact assessment with confidence measurement may

have on the learning path. It recognises that while there is a need to understand how

the greater population behaves during the learning process it is prevalent that we also

consider the finer characteristics and eccentricities of the individual and those of the

smaller groups that they might typify. The research sub questions predominately

addressed by this approach are outlined in Table 2-2.

27

Sub Questions addressed using Interpretivism Research Paradigm

Q1A: “What are the student’s and instructor’s attitudes and perceptions of

assessment with confidence when used for formative assessment?”

Q1C: “Does the use of assessment with confidence measurement provide

additional valuable feedback to the instructor when used for

formative assessment?”

Q2A: ”What are the students’ and instructors’ attitudes and perceptions of

assessment with confidence measurement when used for summative

assessment?”

Q2D: “Does the use of assessment with confidence measurement provide

additional valuable feedback to the instructor when used for

summative assessment?”

Research Q3:

What are the design requirements for developing an interactive

assessment with confidence measurement to ensure that instructors and

students are able to achieve maximum benefit from its application?

Table 2-2:Research Questions Addressed by the Interpretivism Research

Paradigm.

The sub questions identified in Table 2-2 above required a particular research approach

and appropriate data analysis. The empirical data gathered from the surveys, dialogues

and informal interviews are dealt with in various ways including classification and

cluster analysis.

28

2.4 HCI Approach to Problem Solving

Like many research activities embedded in a real world environment, this research finds

itself planted firmly between the needs of finding a particular solution to an identified

problem and performing research in the field, in this case as part of the educational

process. A formal structure has been adopted, which satisfies the requirements of both.

It is for this reason that the methodologies presented here have been chosen to address

the two areas, one for the problem space and the other for the research. It is important

that there is a distinction between the two as they at times run in parallel, and

occasionally crossing paths within the research process.

This study focuses on an approach to learning that is supported by the use of an

assessment incorporating a confidence measurement interactive tool. Consequently,

there has been a need to develop, test and implement an interactive Web-based tool

permitting the students to self assess their knowledge at their convenience

(Zimmerman, 2008). This tool is designed to support the student in self-regulation and

reflection. In general, interfaces use many areas of technology, including multimedia

design and Internet technologies. Critical to the success of any interface system project

is the involvement of the users in the design, development, testing and implementation

phases (Sharp, Rogers, & Preece, 2007). In developing an interactive online assessment

system there is a need to provide the participant with a rewarding, enjoyable and

beneficial experience (Harrison & Petrie, 2007). The HCI problem solving

methodology, like many others, supports an iterative user centred approach,

commencing with the identification of the problem, followed by the formulation of the

goals to address the problem at hand then the development of the solution through an

iterative process involving the users. Once the primary goal has been satisfactorily

defined it is broken down into sub-goals that will assist in achieving the best possible

outcome. The HCI discipline not only recognises this process; it ventures to capture it

as part of the needs analysis activity. This is a fundamental component of the HCI

methodology. The User Centered Design (UCD) approach, as Lindström and Malmsten

promote is used to achieve this (Lindström & Malmsten, 2008). Sharp et al. (2007)

consider the consequences of this approach to have a greater chance of producing a

29

result that is designed to meet the needs of the users, to make the most of human skill

and judgment and one that produces a solution directly relevant to the work, supporting

rather than constraining(Sharp et al., 2007). UCD was initially drawn from successful

Scandinavian experience in the 1970s (Daniel, O’Brien, & Sarkar, 2009;

Schneiderman, 1997) and it has developed into variations of application (Hussain et al.,

2008; Righi & James, 2007), being Participatory Design and Contextual Interaction

design (Sharp et al., 2007).

Once the research problem and the aim are formulated the process of finding a solution

following the HCI User Centered Design methodology can be initiated, as outlined in

the ISO 13407 Human-Centered Design Lifecycle model (Bevan, 2009; Sharp et al.,

2007) and the more complex Usability Engineering Lifecycle, as proposed by

(Mayhew, 1999; Seffah & Metzker, 2008).

2.5 Summary of this Research Structure

This research adopts the general methodologies of positivism and interpretivism and

uses the principles of HCI user centered iterative problem solving approach. In this

chapter the research problem and situation in the real world environment identified in

Chapter 1 is addressed by the formulation of the three main research questions. From

these main research questions a series of seven sub questions are produced in order to

deal with the research problem in context, leading to the establishment of the research

framework where the appropriate research paradigm is identified for each cluster of

questions. The research framework ties in the questions to the research activities to

ensure that the generated quantitative and qualitative data is relevant to the study and

analysed by the most appropriate method.

The following chapter will consider variations to multiple choice assessment strategies

as practiced by other rigorous researchers, comparing their approaches and the

educational theory behind their choices. It then considers assessment with confidence

measurement as a solution to the identified concern that traditional assessment often

fails to provide an accurate representation of students’ knowledge, which is detrimental

to both the student and instructor.

30

CHAPTER 3 VARIATIONS OF NON-CONVENTIONAL MCQ ASSESSMENT STRATEGIES FOR LEARNING

This chapter will consider the supporting literature that places this research firmly in

educational assessment context determined in the scope in Chapter 1. An investigation

is that discusses the work of those who have previously been active in the area of

developing and implementing various self-assessment strategies in an attempt to

address the identified areas of concern. Furthermore, this chapter will consider the

underlying arguments supporting the use of assessment strategies that incorporate

confidence measurement and the fundamental principles for the application of

interactive self-assessment systems as part of learning.

31

3.1 Learning Theories and Learning Styles

It is important to briefly consider learning theories with the various learning styles and

a student’s propensity towards them, as the development of alternative assessment

strategies requires consideration to these learning styles, acknowledging the role that

they play in the educational process. Morris et al. purport that for assessment to be

“fair” it should be appealing to the students learning styles (Morris, Porter, & Griffiths,

2004), while Sternberg et al. consider that assessment options should be determined by

learning styles (Sternberg, 1988).

3.1.1 Learning Theories

It is necessary to consider learning theories to understand the learning process before

discussing the various learning styles. Morris et al. (2004) indentify the three main

learning theories (Morris et al., 2004) as:

• Constructivist;

• Cognitive;

• Behavourist;

The Constructivists view learning as contextual with preference to the practical

applications, encouraging pedagogy with consideration to interactive learning in a

cooperative learning environment of instructors and students (Martin, 2008; Piaget &

Duckworth, 1970). Constructivism promotes students’ meaningful experiences in

learning in the real world with provision of tasks designed to engage the student at an

individual level, offering opportunity for reflection.

The Cognitive approach considers the information processing position of learning

where motivation, memory and reflection permit the connectivity between higher and

lower levels of learning (Novak & Cañas, 2008). The cognitive approach is reliant on

pedagogy where models are used with the sequencing of content to maximise attention

and take advantage of existing cognitive structures.

32

The Behavourists view learning as the change in behaviour with pedagogy providing a

focus with defined outcomes with opportunities for self-testing and interactive feedback

on the student’s achievement (Shoben Jr, 2009).

The pedagogies previously discussed have a reliance on extrinsic and intrinsic

motivational factors. Morris et al. (2004) claim that motivation is fundamental to

learning (Morris et al., 2004), both extrinsically (Trotter, 2006), usually as a result of

the instructor’s need to generate grades, or intrinsically being derived from within the

learner as they strive for self-satisfaction and personal reward. Keller (2008) identifies

five principles of learning motivation in order to overcome obstacles and assist towards

the accomplishment of their goals (Keller, 2008), being:

• When the learner curiosity is aroused due to a perceived gap in their knowledge.

• When the knowledge to be learned is of value to them.

• When the learner believes they can succeed in mastering the learned task.

• When the learner anticipates and experiences satisfying outcomes.

• When the learner employs “Volitional” (self regulatory) strategies to protect their

intentions.

Morris et al. further consider intrinsic motivation with the inclusion of self-monitoring

and control to be the more beneficial in eliciting deeper learning (Morris et al., 2004).

They further encourage the development of models for motivation in learning that

promote capturing the student’s attention by offering them relevance, supporting the

development of a student’s self-confidence and the promotion of a sense of

achievement and satisfaction through interactive feedback. Morris et al. (2004) argue

that strategies for learning must include components that develop and enhance meta-

cognitive skills (Morris et al., 2004).

3.1.2 Learning Styles

Coffield, Moseley, Hall and Ecclestone (2004) and Abdulwahed, Nagy and Blanchard

(2008) consider Kolb’s experiential learning theory (Kolb, 1984), in which he devised

the Learning Style Inventory, as one of the most influential models of learning styles

(Abdulwahed, Nagy, & Blanchard, 2008; Coffield, Moseley, Hall, & Ecclestone, 2004).

It should be noted that Kolb does not himself consider a student’s preference to a

33

particular learning style to be a fixed trait but a differential preference for learning,

which can vary slightly from situation to situation (Kolb, 1999). Kolb’s experiential

learning theory as shown in Figure 3-1, is based on a four stage learning cycle that must

be present for learning to occur and consists of :

• Concrete experience (feeling): Learning from specific experiences and relating

to people.

• Reflective observation (watching): Observing and viewing the environment

from different perspectives.

• Abstract conceptualisation (thinking): Logical analysis of ideas and an

intellectual understanding of a situation.

• Active experimentation (doing): Implementing events through action including

risk-taking.

1

Figure 3-1: Kolb's (1984) Learning Style Model.

Kolb then proposed that learning occurs across two intersecting continua of:

• Processing Continuum: Our approach to a task, such as preferring to learn by

doing or watching.

• Perception Continuum: Our emotional response, such as preferring to learn by

thinking or feeling.

34

Kolb (1984) identified students to have preferences towards pairs of the phases of

learning that lie at either end of the two continua, classifying learners as being either:

• Divergers: Who prefer the concrete experience and reflection on that experience;

• Assimilators: Who prefer reflection on and conceptualisation of the experience;

• Convergers: Who conceptualise the experience and then experiment actively with

the idea;

• Accommodators: Who prefer concrete experience and the opportunity to

experiment with ideas formed by the experience.

Nieweg (2000) and Kolb (1984) consider the opportunity to work through the four

phases to offer the greatest learning experience, where tasks designed to expose

students to all phases encourage stimulation to deeper learning (Kolb, 1984; Nieweg,

2000).

Assessment strategies are required to support learning process, providing the

opportunity for a student to self-evaluate during the learning to confirm their level of

knowledge by their performance in the various activities, as demonstrated by Keller

(2008) in his fifth motivational strategy of volitional (self regulatory) strategies to

protect their intentions (Keller, 2008). These experiences (identified in the learning

theories) are essential to facilitate learning, providing both extrinsic and intrinsic

motivation, thereby leading to deeper learning. Recognition of students’ propensity to

learning styles permits assessment strategy development to create assessment

experiences, or combinations of experiences, that support the four phases of learning,

with consideration given to the Kolb’s (1984) preference to learning styles as outlined

(Nieweg, 2000). In Section 3.11 we will revisit these learning styles and the importance

of feedback and their influence on the learning through Hede’s (2002) Integrated Model

of Multimedia Effects on Learning (Hede, 2002).

3.2 The Value of Feedback in the Learning Process

Hattie and Timperley (2007) clearly identifies Feedback as the single most influential

contributor to the student’s progress (Hattie & Timperley, 2007). His thorough

35

investigations into the influences of student improvement rely on the application of

meta-analysis to the data. It is through this research that he recognises the dominant

influence on a student’s improvement to be in the domain of the teacher. He then

progressively drills down to the services that are supplied by a teacher in performing

their professional duties, such as Instructional Quality, Direct Instruction and Peer

Tutoring, revealing that feedback is at the top of the list. Hounsell, McCune, Hounsell

and Litjens (2008) cite the findings of the 2007 Scottish National Student Survey that

indicates that there is less satisfaction with feedback than any other aspect of teaching

and that to sustain the quality of student learning there is a need to rethink how best to

provide feedback within the changing landscape of education (Hounsell, McCune,

Hounsell, & Litjens, 2008).

In Chapter 1, the need for feedback to be as timely as possible is highlighted, with the

ever-increasing demand on the instructor’s time. Unintentional delay in feedback is to

the disadvantage to both the student and the instructor. Hede (2002) includes feedback

as a significant contributor to the learning process in his model (Hede, 2002), to be

considered in detail in Section 3.11.

3.3 Formative and Summative Assessment as Part of the Learning

Path

The position of assessment in education is well recognised, playing a critical role in the

educational process. The assessment options available to instructors are many and

varied and will be discussed in detail later in this chapter, however the general need to

develop, test and implement assessment systems that can play an important role in the

learning process of the student must be initially recognised, as it plays a critical role in

the path that this research follows.

In Chapter 1, the criteria for effective formative and summative assessment were

discussed, indicating what we look for in assessment strategies and what is considered

to be good practice. The importance of timely, relevant feedback must be emphasised

for formative assessment tasks influencing the student in their learning direction and

assisting the instructor in delivery. Summative assessment is reliant on the same

36

formative assessment attributes, as it contains aspects of formative assessment while

being required to produce reliable, valid results reflecting student’s achievement.

MCQs for many years have played a role in formative and summative assessment

strategies, offering both good and bad experiences, depending on their suitability and

construction. This can be attributed to their ease of use, effortless conversion to the new

encompassing technology and the perceived ability to assess large numbers of students

on broad areas of knowledge efficiently. Whether they comply with the criteria for

good assessment is dependent on the construction of the questions, the suitability to the

area and depth of knowledge being assessed and the purpose of the assessment task,

being either formative or summative.

Educators are aware of the value of self-esteem and the important role that it plays in

the success or failure of a student in the learning process and should always be

considered when formulating assessment strategies (Torrance, 2007b).

Black and Wiliam (1998,2006,2009) use the term “Assessment” as referring to the

group of activities undertaken by both teachers and students in evaluation of knowledge

learnt by providing both grades and feedback for the purpose of modifying teaching and

learning paths (Black & Wiliam, 1998, 2006, 2009). Furthermore, they consider that

assessment becomes formative assessment when the evidence is used to adapt the

teaching in order to meet the needs of the students. On the other hand, it is generally

accepted that summative assessment is an assessment strategy that has the primary

objective of supplying a grade for the student.

Black and Wiliam (1998,2006) refer to the learning environment as being a Black Box,

with input from students, teachers, parents and resources, with the consequential

outputs being students with advanced educational standing (Black & Wiliam, 1998,

2006). It is within this Black Box that they consider the role of Formative Assessment

critical in the transformation of the students’ educational standing, contributing to the

raising of the national standards. Black and William’s (2006) conceptualisation of

formative assessment (Black & Wiliam, 2006) has the following five strategies:

1. Engineering effective classroom discussion, questions, and learning

tasks that elicit evidence of learning;

2. Providing feedback that moves learners forward;

37

3. Clarifying and sharing learning intentions and criteria for success;

4. Activating students as owners of their own learning;

5. Activating students as instructional resources for one another.

Carless (2007) cites Black and Wiliam’s (1998) classification of poor formative

assessment where the feedback is misunderstood or not acted upon, claiming it to be

formative in purpose but not in function (Carless, 2007).

Black and Wiliam (1998) state that students too often are content to get by. Perrenoud

(1998) suggests that a solution would be for the instructors to revisit the teaching

contracts in order to counteract the student’s acquired habits with the inclusion of

formative assessment as a key component of the learning process (Ayala et al., 2008;

Perrenoud, 1998). Morris et al. (2004) state that the instructors continually ask how

they will I know when the students understand the concepts (Morris et al., 2004).

Not all educators are supportive of the common assessment strategies extensively used.

Krumboltz and Christine (1999) and Torrance and Coultas (2009) voice concerns about

the emphases placed on summative assessment and the consequential negative

influence that it has on the learning path of a student (Krumboltz & Christine, 1999;

Torrance & Coultas, 2009). Krumboltz and Christine (1999) feel that assessment with

the primary objective of grading can actually misdirect and inhibit student learning and

that the grading process encourages the teacher to focus on the negative, laying any

fault and failure at the feet of the student. They further consider competitive grading to

de-emphasise learning in favour of judging, displacing learning to a secondary goal of

education. Taras (2009) suggests the solution requires a shift in the paradigm, basing

their definitions of formative and summative assessment on processes of assessment

and not on the functions of assessment. They justify this by stating that the functions

remain as a basic epistemological premise of assessment (Taras, 2009).

38

3.4 Assessment as a Means of Shifting the Responsibility of Learning

to the Student

Many instructors are becoming increasingly aware of the benefits of shifting the

responsibilities of learning into the student’s hands (Chatti, Jarke, & Frosch- Wilke,

2007; Krätzig & Arbuthnott, 2009). As the Guidelines for the Teaching of Educational

Psychology in Teacher Education Programs (American Psychological Association

Work Group of the Board of Educational Affairs, 1997) suggest, educators addressing

the issues of school dropouts, low-levels of academic achievement, and other indicators

of school failure are recommending more learner centered models of schooling. They

also recognise the important concept of Meta-cognitive learning, which concentrates on

thinking about their own thinking (Krätzig & Arbuthnott, 2009). This includes the

critical components of self-awareness, self-inquiry, self-monitoring and self-regulation,

promoting higher levels of commitment, persistence and involvement in the learning

process. The guidelines further recommend that the curriculum is designed to include

components that practice the meta-cognitive strategies of reflective self-awareness and

goal setting. In addition they also state that assessment tasks should foster self-

appraisal and self-regulated learning.

Resulting amended instructional paths instigated from formative assessment are critical

for the students’ success in navigating the learning path. More and more instructors

consider instruction and formative assessment not to be just strongly linked but

inseparable components in the learning experience (Farrell & Leung, 2002a; Shavelson

et al., 2008).

Doebbert (1999) emphasises the need for the student to develop skills in managing and

controlling his/her learning with the utilisation of technology assisting in the process as

they negotiate their educational path. It is important that systems that appear to provide

a multitude of benefits to the students and instructors should be pursued with vigor

(Farrell & Leung, 2004b), giving the opportunity to place the control of learning in the

hands of the student (Karpicke, Butler, & Roediger III, 2009). In particular this can be

supported and achieved by the use of the formative and summative methods of

assessment.

39

Davidoff (1995) identifies a concern with the need for assessment to be designed to

recognise incomplete or partial knowledge and also permit the student to hedge. This is

due to medical knowledge being incomplete, sometimes ambiguous and conflicting.

Components of e-learning are used by many educational institutions, contributing

significantly to the learning experience. The embracing of the Internet by the many in

all aspects of our educational, social, and working life has ensured that this new

paradigm will remain a component of our lives. Our general reliance of this technology

has resulted in the embedding of many services that it offers into our routine activities.

There are very few modern daily events that are not reliant in some way on the

technology that encompasses this phenomenon. Education is one such activity that has

greatly benefited from the Internet, offering diversity and flexibility. The educational

process is no longer confined to the classroom but is now available to the participant on

a global level at all times of the day and night. The extensive use of multimedia

applications has enhanced the learning experience by creating a myriad of flexible

learning material suitable for various preferred individual learning styles, in particular

increasing the accessibility of learning materials to those who would have limited

access if reliant on the traditional format.

3.5 Assessment Using New Technology

In view of the importance of the contribution of e-learning components to education

(Howlett et al., 2009) and the fundamental role played by assessment in the learning

process, it is essential that we explore technologies and approaches that will improve

assessment effectiveness and its impact on the learning process. The nature of e-

learning components often removes the personal interactivity held within the classroom

situation. The challenge for educators is to use e-learning components to enhance the

online learning experience with techniques that provide rich, personalised, timely

feedback to the individual student.

Marshall University (1999) provided a thorough comparative study of online course

delivery software, identifying the active components of each package and the depth of

their application (Marshall University, 1999). Not surprisingly, online assessment using

40

multiple-choice questions format appears in all of the recommended packages such as

Blackboard, WebCT, TopClass, Web Course in a Box, Toolbook. They further state

that only five of the identified ten packages that offer formative assessment also

provide an effective feedback mechanism that directs the student to tutorial paths as a

consequence of formative assessment.

3.6 Concerns with Computer Assisted Assessment

Some educators fear that the technology is artificially driving the usage of Computer

Assisted Assessment (CAA). It is considered that instructors are lured towards an

online assessment option, with the promises of faster turn around, greater feedback and

automated student assessment recording. Hartley, Strudler and Schraw (2008)

expressed concerns around the integrity and quality of computer aided assessment tools

that require extra security restrictions for logins (Hartley, Strudler, & Schraw, 2008).

Popham (2008) warns about the commercial test developers who are prepared to attach

themselves to developers who have enthusiasm for formative assessment (Popham,

2008), when in fact their assessments often do not offer any financial benefit. There is

no doubt that the benefits from the use of a good CAA package are significant and

should be enthusiastically pursued. In some cases, the use of a CAA is not only valid,

but also preferred, especially when test material complies with the requirements of the

CAA system of choice. The concern that should be raised is when the content does not

comply with the requirements of the CAA. In many cases CAA packages are being

used as a summative assessment test where the material to be assessed does not fit into

the functionality of the package, as Farrell and Leung (2004b) demonstrated when

investigating the use of the Blackboard MCQ testing facility for questions comparing

lengthy SQL scripts (Farrell & Leung, 2004b), resulting in the students voicing

dissatisfaction and complaining about the testing procedure. Greenfield (2009)

recommends education to have a balanced media diet using each technology's specific

strengths in order to develop a complete profile of the student’s cognitive skills

(Greenfield, 2009).

41

3.7 Assessment Options Available

Educational institutions use a variety of assessment options to grade their students and

assess the effectiveness and validity of subject content. A critical component of sound

educational program is to assess the learning outcomes throughout the duration of the

course, as both a means of giving timely feedback and as a mechanism to grade the

students, given that each kind of assessment has its purpose (Kennedy, Chan, Fok, &

Yu, 2008).

An issue faced by educators is; “What methods of assessment should they be using and

what would be the appropriate mix to maximise the feedback and evaluation process?”

Schuwirth and Van Der Vleuten (2006) consider a well-designed assessment program

to utilize different types of questions that are appropriate for the content being assessed

(Schuwirth & Van Der Vleuten, 2006). Torrance (2008) argues assessment should

move from assessment of learning to assessment for learning. Tomanek, Talanquer and

Novodvorsky (2008) identify two determining factors when choosing assessment

strategies, the first being the characteristics of the task, in that the testing is aligned to

relate to the qualities of the task regardless of the learning environment, and the second

the characteristics of students or the curriculum, relating to the learning environment in

which an assessment task would be implemented, such as students' abilities to complete

the task (Tomanek, Talanquer, & Novodvorsky, 2008).

The options presently available to the instructors include Multiple-choice Questions

(MCQ) and the suite of Constructed Responses (CR) usually comprising of short

answer questions, longer problem solving questions, case study reports, presentations

and other equally effective and proven methods. In the majority of cases the final grade

is calculated by combining each separate grade from assessment tasks completed during

the subject. The utilisation of multiple assessment methods recognises the need to

permit students to demonstrate their knowledge in various methods throughout their

learning experience.

42

3.8 Multiple-choice Questions

Multiple-choice questions (MCQs) are frequently used in traditional education forums

for both a formative and summative assessment (Tarrant et al., 2009). Swartz (2006)

considers the extensive acceptance and use of MCQs as an assessment tool can be

attributed to their ability to assess broad fields of learning in a compact system while

being quick to assess with inherent objectivity and provide good feedback to the

students at minimal cost (Swartz, 2006). Additionally, their popularity is increased by

the ability to reuse them over periods of time, as recent constructed questions often

contribute to question banks available to the instructors.

MCQs are highly regarded by instructors (Bacon, 2003) and consequently used

extensively, with global experience in their construction (Libarkin, 2008; Schuwirth &

Van Der Vleuten, 2006), and easy adaptation to the computer assessment environment

and increased ease of application. There are two roles that MCQs play in the balanced

educational program. Firstly, MCQs are used extensively as a means of formative

assessment (self-assessment), where the feedback influences the direction of the

students as they journey along their learning path. MCQs are a popular self-assessment

option being readily available to the students due to the advancement of technology that

now supports its functions. Web-based MCQ self-assessment packages permit the

student to self assess their knowledge at any time convenient to them, provide instant

feedback and in many cases recommend change in directions to their learning path.

Secondly, MCQs are also extensively used for summative assessment for the grading of

students, being strategically placed in the exams with various mark allocations directly

contributing to the student’s final grade. Their popularity can be attributed to their

ability to offer equivalent reliability and validity in a shorter amount of time as they

have an economy of scale that does not exist in constructed-response (Bacon, 2003). In

addition they are considered to have the ability to test many topic areas in a relatively

shorter time (Ventouras, Triantis, Tsiakas, & Stergiopoulos, 2010; Wilson & Case,

1993).

Bacon (2003) also identifies one advantage of using MCQs, the “Objective” marking

(Swartz, 2006), as a method of avoiding the deficiency of reliability of essay tests, as he

43

cites previous work of Ashburn (1938), where subjective marking of short essay

answers yielded significant difference in grades when remarked (Ashburn, 1938).

Schuwirth and Van Der Vleuten (2006) and later Govaerts, Schuwirth and Muijtjens.

Emphasise (2007) voice growing dissatisfaction with the MCQ format as they rely on

recognition of the correct answers (Govaerts, C., Schuwirth, & Muijtjens, 2007;

Schuwirth & Van Der Vleuten, 2006), while some see MCQs as only demonstrating

knowledge of isolated facts (Wilson & Case, 1993). Wilson and Case (1993) also state

that they fear this forces undue emphasis on recall and will stimulate students to learn

and rehearse in a like mode (Swartz, 2006). Schuwirth and Van Der Vleuten (2003,

2006) recommend variation in the question format due to the likelihood that students

will prepare depending on the types of questions used, as their medical students often

try to identify what the assessment is so they can prepare strategically, instead of

studying to become better doctors. Bacon (2003) discusses at length the concerns that

the MCQ format is too simple and does not assess the complex levels of knowledge, in

particular the higher levels of Bloom’s (1956) taxonomy of educational objectives

(Knowledge, Comprehension, Application, Analysis, Synthesis, Evaluation) (Starr,

Manaris, & Stalvey, 2008). Bacon (2003) does recognise the examples of MCQs in

Bloom’s (1956) work that demonstrate the application of MCQ testing designed to

assess outcomes at every level, supported by the work of Palmer and Devitt (2007)

where they acknowledge that in the majority of cases MCQ’s only assess recall,

however they purport that well constructed MCQ questions can assess the higher order

cognitive skills (Palmer & Devitt, 2007). They stipulate that higher order MCQ

questions can only be successfully implemented if the questions are peer reviewed,

encouraging the critique of others to contribute to their construction. Bacon (2003) and

Palmer and Devitt (2007) also recognised that this level of MCQ is difficult to

construct. Schuwirth and Van Der Vleuten (2003) in their research argue that the

question format is of limited importance compared to the construction of the question,

as the success of the assessment strategy is primarily reliant on question construction to

be correct.

Ng and Chan (2009) express the shortcomings of conventional MCQ tests where

conventional multiple-choice test method does not capture or consider partial

44

knowledge, supported by Swartz (2006) who consider conventional MCQ testing to

offer inferior discrimination in the levels of knowledge.

3.9 The Suitability of MCQ Tests to the New Technology

The ease of the adoption of the MCQ format to the hypermedia environment was swift

due to the appeal of the ability to produce fully integrated, automated tests that instantly

supply feedback to the participant, together with possible suggested directions for

further study. Furthermore, the increased ability to monitor the student’s progress

through this technology contributes to the student management structure. The result is

that the use of MCQs in general have grown in the e-learning domain, particularly as a

formative assessment tool.

There are many e-learning add-on packages being used by educators today permitting

the construction of MCQs with various formats. On completion of a typical MCQ test

the resulting score with the correct answer identified is usually given as the feedback.

In some cases the incorrect answers are also identified with brief explanations, thus

enabling direction for the student to the appropriate subject for further study. This

method is effective but is dependent on the student answering the questions honestly,

without guessing.

3.10 Previous Work on Innovative Approaches to MCQ Assessment

The previous discussion has identified the value of assessment and the role that it has to

play in education. It also discussed the attributes of good assessment practice,

acknowledging the criteria required to be met for the creation of a good assessment

item. There is a need for assessment to be developed and refined to fill the gap as

technology drives conventional assessment strategies forward. In some cases, MCQs

have shifted away from their original purpose of broad assessment for large numbers of

students to the more complex assessment of higher levels of knowledge. This increased

level of responsibility can often show the flaws of MCQ testing design, especially in

the construction of the test questions. There is anecdotal evidence that the better

45

students tend to resent MCQ testing as they consider the process fails to distinguish

between the higher achieving students and the other less competent.

This section of the chapter reports on the previous work of others in the field that had a

pivotal influence on this research, its direction of this research and the functionality of

the developed facilitating tool. The fundamental concepts underpinning the foundations

of assessment with confidence measurement tool are presented, for instance, the

concept of negative penalty scores for incorrect answers and scoring techniques that

recognise partial knowledge.

3.10.1 The Need for Innovative Scoring for Assessment

The previous discussion has identified the shortcomings of traditional MCQ assessment

that encourage guessing, fails to recognise partial knowledge and often miscalibrates

confidence. Chapter 1 identified the criteria that constitute good assessment practice; a

contributing attribute of good assessment is the adoption of an appropriate scoring

technique. The scoring of MCQs has long been a point of discussion with serious

consideration given to various scoring models. Consequently, scoring models designed

for the purpose of eliminating guessing that use complex scoring algorithms that aim to

produce a more precise reflection of the student’s understanding of the underlying

concepts have been developed and investigated.

Here we will discuss in general some of the influential previous work. It should be

noted that the following examples vary in structure, as some of them are based on

MCQs with four options and others are designed for MCQs with three options. In one

case the assessment is based on singular true or false questions. The scoring technique

discussed is in context to the design of the application and should be considered as

such. Educators, often debate the optimal number of answers, mainly divided between

four and three (Ng & Chan, 2009). In many cases the supporting mathematics can

easily be extended to cater for any variation, depending on the instructor’s preference

for their cohort of students. There will be a need to revisit some of them in depth in a

Chapter 4 when considering the scoring mechanism adopted for this research.

46

3.10.2 MCQs Designed to Eliminate Guessing

MCQs have traditionally required the student to identify the correct response from a list

of possible answers with the resulting score based on the criteria of being correct or

incorrect. Ng and Chan (2009) identify two categories of correctly answered questions,

the first being the number of questions where the student actually knows the answer

and the second being the number of questions where they have correctly guessed. Ng

and Chan (2009) in their work comparing different MCQ scoring techniques using

signal detection theory (De Carlo, 2005) identified three classifications of scoring

variations relevant to this study: Liberal Multiple-choice (Bradbard, Parker, & Stone,

2004; Jennings & Bush, 2006), permitting the student to choose more than one correct

answer, Elimination Testing, permitting the student to select answers which they

consider wrong and Confidence marking, permitting students to assign a level of

confidence or allocate an order of preference (Alnabhan, 2002; Swartz, 2006) to their

choice.

Pollard (1985, 1986, 1993), Hobson and Ghoshal (1996), Bush (2001), Jennings and

Bush (2006) and Frandsen and Schwartzbach (2006) all produced alternative MCQ

scoring techniques designed to minimise random guessing based on a reward and

penalty structured scoring system (Frandsen & Schwartzbach, 2006; Hobson &

Ghoshal, 1996; Pollard, 1985, 1986, 1993; Pollard & Clark, 1989).

In particular Pollard (1985, 1986, 1993) designed and implemented a number of scoring

mechanisms to address guessing. He achieved this by allocating a positive score for

each correct answer and a negative score for each incorrect answer. Pollard and Clark

(1989) provided various options for the penalising of students who incorrectly

identified an incorrect option as correct and a correct option as incorrect. His approach

to assessment produced a series of penalties that were proportionally less than the

rewards for correct responses. Consequently, it only depleted, not negated, the overall

score if a correct option was identified. Educational institutions, such as the Australian

Mathematical Association National Mathematics Quiz, have used Pollard (1985, 1986,

1993) scoring to eliminate guessing, however, to be effective, a full understanding of

the calculations involved in the grading schedule is required, which is often beyond the

comprehension of the average student. Pollard’s (1985, 1986, 1993) scoring mechanism

47

does not use confidence as a knowledge indicator, however it does address the area of

encouraging guessing. While this research acknowledges the merit in this system, it

also has concerns with the level of complexity in the scoring calculations. The student

must fully understand the consequences of his/her action when doing a test, and

therefore there is a need for them to comprehend the method of scoring so they realize

the consequential outcomes. Pollard’s system relies on the combination of boxes

ticked, with many variations that have to be considered, resulting in a grade calculated

by a complex algorithm. A more rigorous explanation of Pollard’s (1985, 1986, 1993)

scoring system is provided in Chapter 4.

3.10.3 Innovative MCQ Assessment with Confidence Measurement

As previously discussed, there is a need to develop assessment strategies to address

both the interference from guessing that Ng & Chan (2009) and De Carlo (2005)

referred to as “Noise” and the inability of the conventional MCQ format to recognise

partial knowledge. Swartz (2006) states that the introduction of confidence

measurement reduces the effect from guessing and provides additional diagnostic

feedback beneficial to the learning process. Ng & Chan (2009) based their work on the

findings of Alnabahn (2002) and Swartz (2006) that confidence measurement (partial

ordering) produced the highest validity measurement and offered advantages in

measurement accuracy.

The following discussion introduces some of the activities of the pioneers of

assessment with confidence measurement who developed interactive systems designed

to eliminate the gains acquired from guessing, encourage more critical, honest self-

assessment and promote declaration of little or no knowledge. Further in-depth

discussion will occur in Chapter 4.

Brown and Shufford (1973) in the discipline of health education produced an MCQ

calibrated scoring system encouraging honesty, designed to permit the student to

register their level of confidence in choosing an answer (Brown & Shufford, 1973 ).

The scoring system severely penalised the participant if they registered high confidence

in an incorrect answer and equally rewarded high confidence in a correct choice. The

primary objective of this scoring system is to identify students that are either over-

48

confident or under confident. In doing this they consider there to be two classes to

benefit, the first, being the student who has a better understanding of their level of

knowledge and second the student who can develop an appreciation of numerical

probability and use it to express levels of uncertainty. Furthermore, Feltz (2007)

considers perception of one’s ability or self-confidence to be the central mediating

construct in striving for achievement (Feltz, 2007).

Paul (1994) and later Klinger (1997) used a computer based interactive system of

scoring for an MCQ format with three options and only a single correct answer, where

the student nominated a position on lines joining the apexes in a triangular shape

(Klinger, 1997; Paul, 1994). Each of the apexes represented the three optional answers,

A, B or C. The triangles were proportionally divided the lines joining the apexes into

segments encouraging the student to nominate their level of confidence. The grade was

then scored according to a logarithmic scale.

Of particular interest is Paul’s (1994) Web-based interactive response system called the

Computer Based Alternative Assessment (CBAA). The CBAA requires the student to

choose an option from three possible answers, A, B or C, registering a level of

confidence in their answer. With this system the student places the cursor within a grid

area aimed to reflect the confidence of their choice. The student must negotiate the area

with a mouse and click on the region that they feel portrays their confidence. Each of

the three options, A, B and C, are located at the vertices of the triangle and the closer

the student positions the cursor to the vertex the more confident they are that it is the

correct answer. A corresponding score is then calculated by considering the position of

the registered level of confidence. The system is presented to the student in the format

shown in Figure 3-2. This research has some initial concerns about Paul’s (1994)

CBAA to be discussed in Section 4.2.1.

49

Diagram A: CBAA Triangle showing

answers options at each apex.

Diagram B: CBAA Triangle showing

strength of belief P(A) that A is correct

associated with each region

Figure 3-2 Confidence Measuring Template, Paul (1994).

The more recent, extensive work of Gardner-Medwin and Gahans (2003) and Gardner-

Medwin (2006) has revealed interesting outcomes having significant influence on the

direction of this research. At this time a brief explanation of Gardner-Medwin’s and

Gahan’s (2003) approach and subsequent scoring method will be given, with a full

description supplied in Section 4. 2.1(Gardner-Medwin, 2006; Gardner-Medwin &

Gahan, 2003).

Gardner-Medwin and Gahan’s (2003) assessment strategy uses a scoring technique that

provides a series of grades that both rewards the student for correct answers and

penalise them for incorrect answers. His scoring technique has three options, permitting

the participant to register 3 distinct levels of confidence, high, moderate or low. The

scoring is applied to true and false question formats but has relevance here given that

traditional MCQs have clusters of 3 or 4 answers, where each rating is true or false for

that stem. Gardner-Medwin’s and Gahan’s (2003) scoring system has some

distinguishing features, in that it rewards the student, who selects the correct answer

proportionally to the confidence registered, that is a grade of 3 for high confidence

registered as C=3, a grade of 2 for moderate confidence registered as C=2 and a grade

of 1 for low confidence registered as C=1. Importantly in contrast, it severely penalises

50

the student who registers a high level of confidence (C=3) for an incorrect answer with

a score of -6, moderately penalises the student who registers moderate confidence

(C=2) for an incorrect answer with a score of -2 and does not penalise the student who

registers low confidence (C=1) for an incorrect answer by giving them 0. In summary

Gardner-Medwin’s (2003) scoring reward for a correct choice stays proportional for all

of the options (3, 2, 1) while the penalty score does not (0,-2,-6). Gardner-Medwin and

Gahan’s (2003) system is forgiving to a student who admits that they have very little

confidence in their answer (C=1) by not penalising them at all (0 score). His arguments

for this are complex and lengthy but can be summarised by his own words, being that

the scoring is properly motivating and that lucky guesses are not the same as

knowledge.

Davies (2005) MCQ scoring for 4 optional answers, is in direct contrast to the scoring

regime of Gardner-Medwin (2006), as he states that the students who demonstrate a

high level of confidence in a correct choice should receive a greater reward,

recommending penalising a student who demonstrated high levels of confidence for

incorrect answers disproportionally less than the student who demonstrated a high level

of confidence in a correct answer (Davies, 2005). Davies also forgives the individual

who declares low confidence in an incorrect answer by not penalising them.

3.11 Interactivity in Learning

The previous discussion identified the learning theories and the need for reflection and

self-assessment to support the learning process. It also acknowledges the extrinsic and

intrinsic motivational factors that drive the learner through the learning experience. In

particular these extrinsic and intrinsic motivational aspects contribute to the learners

propensity towards a specific learning style to assimilate knowledge. There is a need

for this research to formalize this relationship into a learning model where the

components of e-learning activities are integrated into the taxonomy of learning with

the primary objective of acquiring life long knowledge. For this reason this research

will couple together the Learning Theories and Kolb’s (1984) Learning Styles with the

51

contributions of multimedia to learning, using Hede’s (2002) Integrated Model of

Multimedia Effects on Learning.

Online assessment tools with confidence measurement are in general not fully

functional multimedia systems, incorporating such facilities as sound and animation,

but do rely on a level of interactivity, which associates it with the multimedia

educational platform. A major role of online assessment with confidence measurement

is to promote and support self-directed learning (Keller, 2008). Consideration should

therefore be given here to the role of interactive systems in learning. Multimedia

components are commonly used in education, greatly supported by the ease by which

they are accepted in today’s society. They can play a significant role in the learning

environment, contributing to numerous presentation and support material well suited to

the technological arena. Clark and Feldon (2005) identifies five common principles in

support of interactive instructional material, two of which are of significance to this

research, the first being the ability for multimedia components to accommodate various

learning style preferences and the second is the encouragement of student managed

constructivist and discovery approaches to learning(R. Clark & Feldon, 2005) (R.

Clark & Feldon, 2005). Choi Choi, Lee and Jung (2008) in considering multimedia and

learning styles claim that sensing, sequential, and reflective learners tended to have a

more meaningful learning experience with the multimedia learning component

compared with intuitive, global, and active learners (Choi, Lee, & Jung, 2008). With

this far-reaching impact Hede (2002) asks the question of multimedia’s affect on

learning, which has often been a point of contention. Hede (2002) discusses previous

disputes regarding the claim that multimedia elements in education have a significant

contribution, postulating that although the cost of delivery can be reduced and the speed

and availability increased, inclusion of multimedia in the educational process does not

necessarily improve the experience.

In further discussion addressing the concerns of the effects that components of

multimedia have on learning, Hede (2002) attempts to address these inconsistent

findings, through the formulation of his Integrated Model of Multimedia Effects on

Learning. This model is of particular interest to this research as any proposed

52

educational interactive solution by its nature is dependent on the e-learning paradigm

for delivery.

For the purpose of clarification at this time, Hede’s (2002) model will be introduced,

briefly articulating its relevance to this research. Hede’s (2002) model consists of 12

elements and the relationship that bind them, which together demonstrate multimedia’s

contribution to the learning process. In particular, online assessment with confidence

measurement aligns itself strongly with 3 of these elements; Working Memory,

Reflection and Long Term Memory.

Hede’s (2002) Integrated Model of Multimedia Effects on Learning, shown in Figure 3-

3, demonstrates the 12 elements and the relationship that they have with each other as

the student travels the learning path.

Figure 3-3 Hede’s (2002) Integrated Model of Multimedia Effects on Learning

As demonstrated in Figure 3.3, Hede’s (2002) Integrated Model of Multimedia Effects

on Learning categorises the 12 contributing elements into four distinct categories:

• Input: (three element: visual input, auditory input, learner control)

53

• Cognitive Processing: (two elements: attention, working memory)

• Learner Dynamics: (three elements: motivation, cognitive engagement, learner

style)

• Knowledge and Learning: (four elements: intelligence, reflection, long term

storage, learning)

The arrows in the model indicate either causal or an associative relationship between

the conceptual elements.

The Integrated Model of Multimedia Effects on Learning (Hede, 2002) holds the

Learning element as the only fully dependent variable and the Learning Style as the

only fully independent variable with intelligence also a possible candidate, depending

on the construction. Learner control is considered to be an intervening variable, which

is determined by learning style and by the moderating variable cognitive engagement,

which is itself moderated by motivation that is in turn influenced by learner control.

3.12 Contribution of Assessment with Confidence Measurement to

Hede’s (2002) Model.

Online assessment tools offer interactive components that are in the control of the

learner as part of the learning process. This research considers assessment with

confidence measurement to play a supportive role in the gaining of knowledge. The

following discussion addresses the components where assessment with confidence

would be an active contributor in the learning, as demonstrated within Hede’s (2002)

model.

Hede’s (2002) model identifies visual input as an element of the Input classification,

along with auditory input and learner control. This visual input is a significant

component in the design of an interactive assessment tool, as it both determines the

presentation and interaction of the participant to the system as well as facilitating the

graphical displays. Many of the interactive tools developed for education offer learner

control over the input, permitting the user to navigate through the environment as they

deem necessary to achieve the best possible result. Some feel however that learner

control in multimedia applications is less efficient than program control (McNeil &

54

Nelson, 1990). It is generally accepted that the amount of learner control needs to be

proportionally designed in accordance to the capacities of the learners and the time

constraint of the learning experience (Gerjetsa, Scheiter, Opfermann, Hesseaand, &

Eysinkc, 2009).

An assessment with confidence measurement tool plays an important role in the

cognitive dynamics (attention and working memory) of the learning process, reflected

in Hede’s (2002) model, as it facilitates the method to concentrate the attention of the

learner by focusing on particular input and responses. In addition it supports the

working memory by increasing the retention of information by providing rehearsal

(Hede, 2002) whilst establishing referential connection from visual representation.

The role that assessment with confidence has in the learning dynamics of Hede’s (2002)

model is pertains to motivation, (extrinsic and intrinsic), cognitive engagement and

learner style as a key variable in learning (Taylor‡, Sumner, & Law†, 1997). The

design features of assessment with confidence tools often have intrinsic motivational

factors, such as visually pleasing graphics, responsive sliding bars and a clear results

section that is considered to provide some initial incentive to engage with the system

(Hede, 2002). It is the intrinsic motivational factors from the challenging and

interesting content, such as the ability to show graphics and diagrams that will produce

the sustained effort (Najjar, 1996). This invariably leads to deeper cognitive

engagement (Komarraju, Karau, & Schmeck, 2009), often resulting in the learners

taking full control of their learning. This intrinsic motivation underpins concepts of

game theory (Adams & Rollings, 2007) relevant to this study and will be discussed in

more detail in Chapter 6.

According to Hede’s (2002) model, and generally accepted by educators, successful

learning is dependent on converting Working Memory to Long Term Memory (Seufert,

Schütze, & Brünken, 2009), ultimately progressing towards the final goal of Learning.

Assessment with confidence measurement can play a significant role in this process by

facilitating rehearsal and revision of content, contributing to this essential conversion.

Taylor, Sumner and Law (1997) consider the process of reflection often results in self-

directed learning where the learners think critically about their current knowledge and

their learning strategies. This also addresses Davidoff‘s (1995) concerns of the

55

miscalibration of confidence, which can often occur with the use of traditional MCQ

formats that permit and encourage guessing. He considers miscalibrated confidence in

medical education equally as concerning as lack of knowledge. Further, miscalibration

of confidence can often lead to the transferring of incorrect facts from the participants

working memory to long-term memory. This situation is highly undesirable having

extremely negative affect to the student’s learning, where wrong knowledge is

reinforced as being correct.

The establishment of cognitive linking for further connections to exist in the

procurement of new content built on the existing knowledge is a fundamental

requirement for the advancement of learning. Kalyuga, Chandler and Swellers (1998)

research has demonstrated that the effectiveness of multimedia strategies varies

depending on the learner knowledge and experience (Kalyuga, Chandler, & Sweller’s,

1998).

Figure 3-4: Relation of Assessment with Confidence Measurement to Hede’s

(2002) Multimedia Model.

56

The objective of assessment with confidence is to support the student’s reflection,

assisting in the passing of knowledge from the working memory to the long-term

memory, whilst establishing a strong foundation for additional knowledge to be built

upon, facilitating the cognitive linking process.

The role of the assessment with confidence measurement (ACM) when incorporated in

the learning strategy is demonstrated in Figure 3-4, where the path of the individual’s

activities when using ACM is superimposed onto all of the relevant contributing

elements of Hede’s (2002) Integrated Model of Multimedia Effects on Learning. It is

noticeable that some of the elements from Hede’s (2002) original diagram have been

deliberately excluded to clarify the particular role of ACM. This is justified as some of

the elements, such as Audio input, are not part of the operation of the version of ACM

relevant to this research, while others, Intelligence and Learning Style, are of utmost

importance and can be assumed to be a major contributor as part of the learning process

under consideration.

3.13 Assessment with Confidence Measurement as the Proposed

Solution

The discussion above has highlighted the concern that present traditional MCQ

assessment strategies are requited to meet the criteria of good assessment practices, by

supplying valuable, timely and comprehensible feedback while offering grades with

validity and reliability, contributing to the learning process, building confidence and

encouraging deeper learning. In contrast to this MCQ tests usually permit and

encourage guessing, fail to allow for demonstration of partial knowledge and

discourage the honest declaration of little or no knowledge. The evidence of previous

research has provided a good foundation for the development of assessment strategies

to the benefit of both instructors and students. The research of Pollard (1989), and

further pursued by Hobson and Ghoshal (1996), Bush (2001), Jennings and Bush

(2006) and Frandsen and Schwartzbach (2006), demonstrate the advantages of

developing a scoring system designed to eliminate the gain from guessing and honestly

reflect the student’s understanding of the subject. Paul (1994), Brown and Shufford

57

(1993) and Klinger (1997) all have demonstrated variations of systems designed to

capture a numerical representation of the student’s level of knowledge by use of

confidence. Gardner-Medwin and Gahan’s (2003) and Gardner-Medwin (2006) has in

the past and present implemented his online assessment strategy incorporating

confidence measurement designed to deter guessing by penalising severely any student

who demonstrates high confidence in an incorrect answer. Davies (2005) developed his

assessment strategy based on the assumption that high confidence in a correct answer

deserves greater reward and applies a completely opposite grading system to that of

Gardner-Medwin (2006).

This research, in addressing the earlier mentioned concerns, proposes the use of an

interactive confidence measurement assessment strategy based on the traditional MCQ

presentation format of a stem (question) followed by four optional answers, to be used

for formative and summative purposes. This version of the MCQ permits the instructor

to provide more than one correct answer in the four answers given for consideration by

the student. It further selects for the balanced reward and penalisation scoring system

for implementation, decided upon as a result of extensive consideration to scoring

options to be discussed in detail in Chapter 4.

3.14 Summary

The rapid development of the Internet as a means of educational delivery and

assessment encourages the designing, testing and evaluation of innovative assessment

tools. This chapter first considers the various learning theories and their reliance on

intrinsic and extrinsic motivational factors. It then presents Kolb’s (1984) four-stage

learning style model, demonstrating the importance of assessment strategies that work

through those four phases to encourage stimulation for deeper learning. This chapter

then identifies the importance of feedback in the learning process supporting the

shifting of the responsibility of learning to the student via self-assessment and self-

regulatory learning strategies. The advent of technology has significantly influenced the

way assessment is used, often pushing their application beyond their original intention,

such as the extended deployment of MCQs. This situation has created a new set of

58

problems as the reliance on them to return an indication of a student’s level of

knowledge has increased. The chapter then reflects on the previous work of others who

have devised scoring methods to address the shortcomings of traditional MCQ

assessment strategies in an attempt to improve the calibration of a student’s level of

knowledge and as a mechanism of reflecting their understanding of content as precisely

as possible. In particular the work of those researchers that have used confidence

measurement aimed to reflect the true level of knowledge of the student, keeping them

honest to themselves. It postulates that assessment with confidence measurement has a

significant role to play in contributing to the galvanisation of the comprehension of the

content supporting the acquirement of knowledge. The chapter then closes with the

discussion on Hede’s (2002) model of multimedia effects on learning that gathers

together the contribution of assessment with confidence to the learning experience by

producing intrinsic motivational factors that provide rehearsal and self-reflection,

invariably leading to deeper cognitive engagement and support the passing of

knowledge from the working to the long term memory.

Chapter 4 will discuss the various scoring options used by other researchers and

developers highlighting their strengths and their weaknesses, demonstrating the

mathematics that support their implementation. It also contains a comparative

mathematical analysis in support of the scoring method adopted by this research.

59

CHAPTER 4 SCORING OPTIONS FOR ASSESSMENT WITH CONFIDENCE

Chapter 3 identified the concern that many assessment strategies do not meet the

criteria for good assessment practices. This can be partially attributed to the advent of

new technology pushing traditional testing practices beyond the purpose of their initial

design. This is the case with MCQs that were designed to assess large groups of

students on broad areas of knowledge and are now frequently used to assess deeper

levels of knowledge, therefore requiring greater effort to construct and have increased

complexity, challenging the most experienced MCQ question writers. Additionally, the

conventional MCQ testing methods encourage guessing and fail to reward partial

knowledge, providing a linear solution to a multidimensional problem. Assessment with

confidence measurement is designed to increase the feedback to the participant in an

attempt to reflect as best as possible their knowledge on any tested area.

This chapter’s focus is solely on the selection of a scoring system for implementation

into MCQs that will optimise the student’s and instructor’s feedback, usage and

interest. As part of this discussion we consider in detail the work of others in their

attempts to address the aforementioned problems, discuss at length the positives and

negatives of each study and finally mathematically compare the scoring options

available. The contribution of any adopted scoring mechanism to the effectiveness of

the self-assessment exercise is of the utmost importance as Sim, Read and Holifield,

(2008) have identified the student as having the most to lose. Hence, the responsibility

of using an appropriate method is critical. In particular this chapter revisits and

thoroughly discusses the extensive work of Pollard (1985; 1986; 1993) and Pollard and

Clark (1989), Gardner-Medwin (2006), Gardner-Medwin and Gahan (2003), Paul

60

(1994), Klinger (1997) and Davies (2005), as their contribution to the field has been

both influential and of great value.

61

4.1 Taxonomy of Scoring

This research argues that there are attributes of a good assessment strategy that need to

be met in order to produce valuable feedback, to help the student learn by adjusting the

learning path, and ensure validity and reliability to enable grading of the students fairly

and consistently. Consequently there is a need to prescribe scoring methods that suit the

exercise and can vary depending on the use. The objective of any self-assessment

exercise is to place the student in the most optimal position to evaluate their

performance and consequently modify their direction of study in order to address the

shortcomings of their knowledge of the topic being considered. To improve the

instructor’s evaluation of a student it is necessary to ensure that the scoring method

adopted is appropriate, requiring careful consideration, as an incorrect scoring method

choice could lead to unsatisfactory results.

At present there are quite a few MCQ assessment-scoring techniques used. In order to

set the scene some will be briefly introduced and summarised here.

The Conventional MCQ scoring awards 1 mark for a correct answer with 0 marks for

an incorrect answer. This is the most common scoring technique used. There is a slight

variation of this where a negative mark can be assigned for an incorrect answer to

counteract guessing.

The Liberal scoring system permits the student to identify more than one correct

answer. The scoring method as devised by Hobson and Ghoshal (1996) is applied as

follows;

If one of the answers identified is correct the student receives a score depending on the

number of answers they nominated. As an example if the student choose 2 out of 5

options that includes the correct answer they get a proportion of the full marks they

would get if they had of chosen the single correct answer. Bush (2001) allocated 1 for a

correct answer and incorporated a penalty for choosing an incorrect option of –1/n-1

marks (n is the number of options). Frandsen and Schwartzbarch (2006) used a

logarithmic function based on the number of options and registered guesses to award a

62

positive score for a correct choice and a negative proportion of this for an incorrect

answer.

Elimination testing has the candidate marking as many incorrect answers as possible,

receiving 1 mark for each incorrect option identified and marks deducted for including

the correct answer in the choice of incorrect answers (Bradbard et al., 2004). Pollard

(1986) has a variation on this approach in which he also introduces positive marks for

correctly identifying incorrect answers and negative marks for identifying correct

answer as incorrect.

Finally, Confidence marking is defined to be where the candidate declares their level of

confidence in their answer. The calculation of the mark varies and will be discussed at

length in this chapter, but all variations include the registered level of confidence in the

calculation.

4.2 Previous Scoring Methods to Address the Issue of Guessing

As previously presented the adoption of methods of scoring that uses penalties is not

confined to recent times. There have been many documented examples in the past

where educators have introduced and evaluated various techniques designed to produce

more discerning results in an attempt to address the issue of guessing or hedging when

sitting a test. None more cited than the extensive work of Pollard (1985, 1986, 1993)

and Pollard and Clark (1989) who developed, trialed and implemented a

mathematically sound marking system for traditional multiple-choice question (MCQ)

tests that was specifically designed to minimise the affect on any grade from guessing.

Pollard (1986) argued that guessing was an accepted practice, either in a sensible

manner or totally randomly. He further considered that the evidence obtained by his

previous work (Pollard, 1985) maintains guessing is not a minor component in

examinations, but plays a major role. Others, such as Paul (1994) and Klinger (1997),

developed their systems to address the same issue, as they consider guessing to produce

significant noise when grading students, and to mask the large variety of states of

knowledge. As Pollard (p 50, 1986) states: “As guessing has nothing to do with

knowledge, it makes sense to design a paper that will minimise the effects of guessing.”

63

Gardner-Medwin (2006) also considers that lucky guesses not to be the same as

knowledge, and that confident, wrong answers require special attention, claiming

guessing can have extreme detrimental effects on the student’s learning.

The belief underpinning Gardner-Medwin and Gahan’s (2003) work is that to measure

knowledge one must measure a person’s degree of belief, simply demonstrated when he

considers the words often used to represent different states. He maintains that educators

describe the degrees of belief that a student has about a true statement as having one of

the following: Knowledge, Uncertainty, Ignorance, Misconception or Delusion.

Gardner-Medwin and Gahan (2003) assigns probabilities for the truth to the above

student states, saying that they range from 1 to 0, where p=1 is knowledge, p=.5 is

acknowledged ignorance and p=0 is delusion, with uncertainty placed between .5 and 1

and Misconception between 0 and .5. Delusion is of extreme concern as it is a total

belief in something that is false. In light of this, ignorance is not the worst state to be in.

Misconceptions (p=.33), having a level of confidence in an incorrect Answer can be an

obstacle in learning, especially when attempting to build high levels of learning.

Paul’s (1994) extensive work supports the notion that we need to implement grading

systems that proportionally reward the participant, especially when it comes to

acknowledging levels of belief in their answers. Paul’s (1994) developed Computer

Based Alternative Assessment (CBAA) tool was specifically designed to address the

concern of a student’s tendency to guess, while realising greater benefit from the use of

innovative assessment alternatives in the resource limited settings of typical educational

environments. Educational institutions’ enthusiastic adoption of the hyper-media

environment encourages the engagement of the students at visual, aural and kinesthetic

dimensions, promoting sensory activities employing both the left-brain and right-brain

faculties. Like others, Paul (1994) voices concern with the use of the traditional MCQ

scoring technique, questioning its appropriateness, questioning if a student’s knowledge

black or white and how best can a student express belief in the likelihood of an

alternative correct answer. He formally classifies the contributions of self-assessment to

the student as establishing and revealing status (knowing what you know), diagnosis of

weakness (knowing what you do not know), comparative analysis to the larger

population (Where do I stand?), assimilation into internal cognitive framework (pulling

64

it together) and finally higher order cognitive abilities (synthesis). In addition, there is a

need to support meta-cognition, to empower the student in managing their learning and

demonstrate positive correlation between study and achievement. Paul (1994) continues

by identifying the desire to produce innovative assessment that drives the student, as

often the assessment tasks determine the instruction and depth of learning. If the blend

of assessment does not involve the higher order cognitive abilities, such as problem

solving, then the instructors too often do not address those areas in their instructional

material.

4.3 Scoring Using Penalties for Incorrect Answers to Reduce the

Impact of Guessing

In the past, educators have introduced various scoring techniques to penalise students

for incorrect answers. In this section some of the scoring methods that have influenced

this research will be presented and discussed, demonstrating the mathematical argument

in their support. Each of the presented scoring methods has varied degrees of penalty

applied for incorrect answers and in most cases substantial mathematical justification

supporting implementation. It should also be noted that the structures of the various

assessment strategies referred to here vary: some are based on single answer style

questions, and four and three option MCQ formats with one correct answer. The

assessment strategy proposed by this research permits the instructor to have one or

more correct answers to an individual question, which needs to be considered when

comparing the approaches to scoring. This is reflected in section 4.4 where the scoring

used for this research is discussed at length.

It is appropriate that the work of Pollard (1985,1986,1993) be considered first as it

offers a foundation on which others have been built. Pollard’s MCQ scoring technique

requires the participant to correctly identify the one correct answer by a tick. If this is

not initially apparent, that is, the correct answer not known on the first inspection, the

participant can place a cross for options that they consider to be wrong. The utilisation

of Elimination Scoring (Ng & Chan., 2009), as Pollard (1986) promotes, gives the

student the opportunity to score by eliminating the incorrect answers, which although

65

quite effective in encouraging the students to reveal their true state of knowledge, can

confuse them in the process. Consequently, a completed test would consist of a series

of questions with a tick, crosses or both next to the options.

To explain Pollard’s scoring approach we first must consider all the possible responses

to a question with 4 options. Table 4-1 demonstrates the possible ordered responses

(Resp 1 to 10) for a question from the candidate, albeit a simple example where the last

option D (in green) is correct and the other options A, B , C (in red) are incorrect.

Resp

Answer Resp 1 Resp 2 Resp 3 Resp 4 Resp 5 Resp 6 Resp 7 Resp 8 Resp 9 Resp 10

A X X X X X X X

B X X X X

C X

D X X X

Table 4-1: Possible Responses with Four Options, Pollard (1985).

KEY Student Selected As Correct Answer is Correct X Student Selected As Incorrect Answer is Incorrect

The green crosses and ticks designate that the student has correctly identified an option

to be incorrect or correct respectively. In contrast the red crosses and ticks designate

where the student has incorrectly identified an option to be incorrect or correct

respectively.

In this case the student will place a tick for the option that they consider to be the

correct answer if they know, as demonstrated in Response 4 of the Table 4-1, receiving

maximum marks. If they are not too sure of the correct answer they can place a cross in

the options that they know are not correct, receiving some marks for identifying

incorrect options but not full marks as they have not yet identified the correct one, as

demonstrated in Responses 1 and 2 of Table 4-1. Pollard makes the assumption that by

66

identifying three answers as being incorrect then the remaining fourth answer must be

correct. Therefore, if a student places 3 crosses correctly identifying the incorrect

options it is assumed that they have correctly identified the correct option and get the

full marks as if they have placed a tick next to it, as can be seen by Response 3.

Educators often debate this point with some considering this to be an improper

deduction, as a student might not necessarily be sure that the fourth option is correct.

The remaining Responses 5-10 contain situations where the student has either

incorrectly identified an answer as being correct by placing a tick next to an incorrect

answer (Responses 8,9,10), or incorrectly identified an option as being incorrect by

placing a cross next to the correct answer D (Responses 5,6,7). With these possible

responses now identified the scoring can be considered.

Pollard relies on a complicated process of allocation of partial scores for a student’s

correct identification of both correct and incorrect answers, facilitated by assigning ki

values to the various responses, culminating in 9 k values, where k1 to k3 contribute to

the positive marks given and k4 to k9 contribute a negative effect to the score. The k

values combine together to give a score for the question depending on the combination

of correct and incorrect crosses and ticks. This approach produces a complex array of

scoring formulas. The calculated score based on the combination of tick and/or crosses

given for any question is displayed in Table 4-2. Resp

Answer

Re

sp 1

Resp

2

Resp

3

Resp

4

Resp

5

Resp

6

Resp

7

Resp

8

Resp

9

Resp

10

A X X X X X X X

B X X X X

C X

D X X X

Score k1 k1+k2 k1+k2

+k3

k1+k2

+k3

-k4 k1-k5 k1+k2

-k6

-k7 k1-k8 k1+k2

-k9

Grade 1/6 1/2 1 1 -1/2 -1/2 0 -1/3 -1/4 0

Table 4-2: Scoring Formulas for Responses Pollard (1985).

67

A value is assigned to a constant k, which is used as the basis of the formula to

calculate the final grade. The k’s (all positive) displayed in the last row in Table 4-2

indicate the formula for the score given for the response directly above.

You will notice that in the first response, Resp 1, the student does not know the correct

option but appears to know that the first option given is incorrect and confidently

identifies it as such with a cross (X1). In this case the score, k1, must reflect this

confident choice by rewarding them with a positive partial score. This is the same for

the responses for Resp 2 and Resp 3 where the candidate correctly identifies a further 1

or 2 more incorrect answers, being graded k1+k2 and k1+k2+k3 respectively. Placing a

tick in the correct option (Resp 4) is the same as identifying all of the incorrect options

and receives the same grade as Resp 3, k1+k2+k3. However, identifying the correct

option as being incorrect Resp 5, invokes a negative score of –k4, and the combinations

of identifying some of the incorrect options correctly and the correct one incorrectly

(Resp 6, Resp 7) have scores incorporating penalties, k1-k5 and k1+k2-k6 respectively,

where k5 and k6 have been introduced as a means of subtracting an amount from the

score for falsely identifying the correct answer as being incorrect while correctly

identifying as incorrect one or two options respectively. Similarly Resp 8, Resp 9 and

Resp 10, where a candidate correctly places 0,1,or 2 crosses but incorrectly places a

tick, receives the marks of -k7, k1-k8 and k1+k2-k9 respectively. Pollard’s scoring criteria

is based on the expected outcome (or Gain) of such an activity, and combinations of

possible further results too numerous to demonstrate here. He postulates that any value

for the k’s must be calculated such that any score must not be increased by guessing. In

doing so he places restrictions on the k values, which are placed as a value in expected

value equations for optimising. Some of these equations will be given below for

discussion as samples with their restrictions but again there are too many to be fully

displayed here.

The following six equations give the expected score, E(S), for an individual who has no

knowledge and is guessing:

68

Randomly guessing one cross: E(S) = 3/4* k1+1/4*(- k4)

Randomly guessing two ordered crosses:

E(S) = 1/2*(k1+k2) + 1/4*(- k4) + 1/4*(k1-k5)

Randomly guessing three ordered crosses:

E(S) = 1/4*(k1+k2+k3) + 1/4*(- k4) + 1/4*(k1-k5) +1/4*(k1+k2-k6)

Randomly guessing a tick:

E(S) = 1/4*(k1+k2+k3) +3/4*(-k7)

Randomly guessing a cross and a tick:

E(S) = 1/4*(k1+k2+k3) + 1/4*(- k4) + 1/2*(k1-k8)

Randomly guessing two ordered crosses and a tick:

E(S) = 1/4*(k1+k2+k3) + 1/4*(- k4) + 1/4*(k1-k5) +1/4*(k1+k2-k9)

In order that any candidate has no expected gain from randomly guessing, these

equations must all be less than or equal to 0, which requires the assigned values of k4 to

k9 (those that contribute a negative affect to the score) be selected to ensure this to

occur. In addition, the equations that represent a candidate who correctly assigns one

cross and guesses others, as well as the candidate who correctly assigns two crosses and

guesses others (equations not included here) have to be considered to eliminate the gain

from guessing.

The resulting constraints apply:

3k1-k4 < 0; 2k2-k5 < 0;

k3-k6 < 0; k1+k2+k3-3k7 < 0;

k2+k3-2k8 < 0; k3-k9 < 0;

Pollard (1986) produces a number of solution sets satisfying these constraints,

identifying two preferred sets that provide the most effective results (see Table 4-2).

His final decision was based on the need to recognise partial knowledge and

maximization of the minimal score for incorrect responses.

Consequently he recommends choosing either of the following scoring method as

outlined in Table 4-3.

69

k’s First set of possible k

values satisfying

equations

Second set of possible k

values satisfying

equations

k1 1/6 1/5

k2 1/3 3/10

k3 1/2 1/2

k4 1/2 3/5

k5 2/3 3/5

k6 1/2 1/2

k7 1/3 1/3

k8 1/12 2/5

k9 1/2 1/2

Table 4-3: Pollard’s Two Solutions for k Values.

Applying these values to the scoring formulas given in Table 4-2 the corresponding

scores are shown in Table 4-4 below.

Table 4-4: Example of Pollard’s Scores for Both Sets of Values of k.

Although Pollard does not identify a preferred scoring mechanism from the two above,

in his later work, when he considers the more complex non ordered simulation, the first

Response Scores Scores using first set

of possible k values

Scores using second set

of possible k values

1 k1 1/6 1/5

2 k1 + k2 1/2 1/2

3 k1 + k2 + k3 1 1

4 k1 + k2 + k3 1 1

5 –k4 –1/2 –3/5

6 k1 – k5 –1/2 –2/5

7 k1 + k2 – k6 0 0

8 –k7 –1/3 –1/3

9 k1 – k8 –1/4 –1/5

10 k1 + k2 – k9 0 0

70

column values (k1=1/6; k2=1/3; k3=1/2 etc) are again contained in the final table of

preferred options, strengthening the argument for it as the nominated final choice.

Applying these values of k to the Expected Score (E(S)) equations that are designed to

emulate the expected score when a candidate guesses without the knowledge above we

obtain the results in Table 4-5.

Expected Score E(S) for a student with

no knowledge randomly guessing….

Calculated E(S)

one cross E(S) = 3/4*(1/6)+1/4*(- 1/2)=0

two ordered crosses E(S) = 1/2*(1/6+1/3) + 1/4*(- 1/2) + 1/4*(-

1/2)=0

three ordered crosses E(S)=1/4*(1)+1/4*(-1/2)+1/4*(-1/2)

+1/4*(0)=0

one tick E(S) = 1/4*(1) +3/4*(-1/3) =0

one cross and one tick E(S) = 1/4*(1) + 1/4*(- 1/2) + 1/2*(-1/4)

=0

two ordered crosses and one tick E(S) =1/4*(1) + 1/4*(- 1/2) + 1/4*(-1/2)

+1/4*(0)=0

Table 4-5: Expected Scores for Random Guessing, Pollard (1985).

These results confirm the scoring regime that eliminates any positive gains for

guessing. Pollard’s work is of the utmost interest to this research as it has paved the

way for other scoring systems to be developed. In particular, it established a

mathematically valid technique of using penalties in an attempt to deter students from

guessing, as they know that the consequences of their actions will have a negative

effect on their grade. At the same time it also recognises partial knowledge, as the

positive recognition of an incorrect answer gives partial marks. While it offered an

effective alternative to scoring it is very difficult to apply, confusing for the instructor,

and most importantly, more so for the student, possibly distracting from their attention

to the question at hand. Pollard’s scoring is used extensively for the Australian

Mathematics Competition where it has served the purpose well, contributing greatly to

that institution. However, extended application to the smaller, customized educational

71

bodies without the extensive infrastructure required to process them is difficult to

implement and support.

The work of Pollard (1985,1986,1993) and Pollard and Clark (1989), though

mathematically valid, relies on complex calculations to achieve the final result

effectively removing control from the hands of the users. The primary objective of this

research is to provide a self-assessment tool to be used by the student, offering them

direct control over the consequences of their actions. The complexity of Pollard’s

(1985,1986,1993) marking system and its required understanding could disadvantage

many of the participants. The utilisation of “Elimination Scoring”, as Pollard

(1985,1986,1993) promotes, gives the student the opportunity to score by eliminating

the incorrect answers, which although quite effective in encouraging the students to

reveal their true state of knowledge, can confuse them in the process.

The design of the scoring system for this research was influenced by Pollard

(1985,1986,1993) and Pollard and Clark (1989) underpins arguments for deterring

guessing, incorporates probability in the decision-making and the need for the

recognition of partial knowledge.

Paul’s (1994) Computer Based Alternative Assessment (CBAA) is designed to address

the issues previously discussed and his arguments and supportive reasoning behind the

instigation of the CBAA innovative approach to assessment significantly underpins this

research.

The major goals of the CBAA is to improve the value of assessment by providing more

useful experiences, achieve more valid indication of the student’s knowledge, and

produce comparable measures in assessing students’ ability to apply their knowledge to

solving problems. Paul (1994) suggests that the CBAA package offers discrimination

between finer grained states of knowledge, greater disclosure of student’s ability to

apply their knowledge and increased awareness of the students own knowledge state.

Importantly, Paul (1994) believes that using the traditional 0-1 scoring system loses the

ability to discriminate between states of knowledge. Like Gardner-Medwin and Gahan

(2003), Paul (1994) uses common expressions to support the student through the

experience, such as, “I strongly believe B to be correct”, “I believe C to be correct but I

can’t distinguish between A and B” or “From what I know each alternative seems

72

equally likely to be correct.” These expressions are equated to equivalent probabilities.

In his diagnosis of previously adopted scoring systems Paul (1994) justifiably identifies

concerns of the traditional “Number Right” grading system, where the same mark is

assigned to those who have knowledge and those that have guessed, or alternatively

grouping together those with complete misinformation, those with some

misinformation and those who guess incorrectly. The “Correction for Guessing”

formula adopted by some educators is an alternative, which can be applied at the final

stages of the calculation of the score. However, correction for guessing does not take

into consideration the effects of partial knowledge and does very little to encourage

students to report their true levels of knowledge.

The value of assessment strategies that offer wider ranges of responses is that they

permit students to demonstrate their true level of knowledge. To be effective they

should also encourage students to participate in the activity by rewarding them

appropriately depending on the perception of the probability distribution. Brown and

Shufford (1973) demonstrate that people who are aware that they will be rewarded

according to admissible schemes will divulge probabilities they believe in and not

attempt to shade them one way or the other to exploit the scoring system, as to do so

would require the student to place bets that are considered unrewarding. It is the

admissible or proper scoring system as outlined above that truly encourages honesty.

After much consideration and deliberation Paul chose to develop the CBAA around a

“Confidence Reporting” framework, hence the relevance to this research.

Paul’s (1994) CBAA, as briefly introduced in Chapter 3, is an interactive computer

based system, which presents the candidate with a triangle offering three alternative

answers positioned at the apex’s, similar to the later developed Klinger’s (1997)

triangular interactive system mentioned in Chapter 3. This is demonstrated in the first

Diagram A in Figure 3-2. The triangle has 16 zones in which the student can place the

cursor over to indicate their confidence towards any particular preference, be it A, B, C

or any combination.

Field studies combined with the work of Brown and Shufford (1973) identified 16

zones as optimal when considering infinite precision probability space, providing

sufficient discrimination among the student’s knowledge states, avoids minutia

73

obsession behaviour, minimises reliance on excessive motor skill manipulation and

most importantly, “exhibits intuitive correspondence between the visual regions and

their interpretations” (Paul, p 18, 1995). When moving the cursor, the proximity to the

apex represents the belief in the answer indicated to that vertex. The corresponding

probabilities for each zone when the correct answer is A are demonstrated in Figure 3.2

Diagram B. The top apex where the correct option is placed has the probability of 1

while the other apexes are assigned zero. There are various other probabilities assigned

depending on the proximity to the correct apex, notably placing the cursor in the

middle, declaring that the student is not sure of the correct option, assigned .33, as

expected. A student placing the cursor in any of the 16 zones creates the three-element

vector < PA, PB, PC >, where PA is the probability registered relative to A, PB is the

probability relative to B and PC is the probability relative to C. Consequently,

positioning the cursor in the middle of the triangle gives the vector <.33, .33, .33>. The

scores generated are based on a logarithmic function that calculates a set of non-linear

results. As an example a student that positions the cursor close to the correct answer A

at the top vertex, say .8 in Figure 4.1, receives the score of 92/100, which reflects their

strong commitment to a correct answer. The same positioning towards an incorrect

answer, .2 or 0 gives a score of 46/100 and 0/100 respectively, depending on whether

the cursor is positioned favouring the correct answer or the other incorrect answer. The

student is required to complete the process for each question in the test generating

scores as they go. The final grade is presented to them at the end of the test with

feedback of which questions they got wrong and the correct answers for those

questions.

Paul (1994) discusses and justifies his choice of scoring at length. He relies heavily on

the notion that a scoring system should be “admissible” or “proper” as defined by

probability theory, stating that students who exhibit high belief in the likelihood of a

correct answer should be rewarded higher than those who “shade” their reporting with

lower levels of confidence. Paul’s (1994) adopted scoring system is based on the work

of Brown and Shufford (1973) in which they developed a scoring system to quantify

uncertainty into numerical probabilities for representation of intelligence.

74

As an example, a student (1), who indicates 40 per cent for an incorrect answer, is not

completely wrong but neither are they right. They are better than another student (2)

who indicated 80 per cent for the same incorrect answer, and should accordingly be

graded higher; however, they are not as good as the third student (3) that registered 10

per cent for the same incorrect answer, who should receive a higher grade than student

(1).

Paul’s (1994) scoring is based on assigning credit according to a scheme similar to

wagering, where there are various wagers with various odds. It assumes that the more

knowledgeable students will gain greater credit over the extended period of time than

the less knowledgeable. If there are f(u)du wagers available at the correct odds of

(1-u)/u a student who believed the likelihood of an item being correct to be p would

accept all wagers at odds better than (1-p)/p that the item is correct and accept all

wagers on the item not being correct at odds better than those appropriate for

probability 1-p. This gives rise to the following equation for payoff for a student

choosing among n alternatives where pi is the probability of the ith alternative.

pi n pj

Payoff if ith event occurs = ∫ f(u) ((1-u)/u ) du - ∑ ∫ f(u) du

0 j≠i 0 pi n pj

= ∫ (f(u) du) / u - ∑ ∫ f(u) du

0 j=1 0 This formula needs to be adjusted to eliminate the possibility to “game” by assuming

equal probability for all by requiring the student to take odds on wagers placed at

probabilities greater than 1/n and to offer the odds on wagers placed at probabilities less

than 1/n yielding:

pi n pj

Payoff if ith event occurs = ∫ (f(u) du) / u - ∑ ∫ f(u) du

1/n j=1 1/n

Paul (1994), like Gardner-Medwin and Gahan (2003), Gardner-Medwin (2006) and

Brown and Shufford (1973) leverage greatly off Shannon’s (1948) Information Theory,

75

preferring to assign the function f(u) = 1 based on logarithmic scoring, yielding

through integration the logarithmic scoring system:

n Expected Profit = logn - (- ∑ pi log pi ) i=1

(Payoff on the ith event, with n alternatives and pi is the probability of the ith

alternative)

This equation corresponds to the maximum likelihood method of statistical estimation.

The logarithmic scoring system has the student’s reward, on average, equaling the

amount of knowledge that they possess of the material in the question, as it depends

solely on the probability assigned to the alternative that is actually correct (Paul 1994).

This final interpretation of the formula for scoring is adapted by Paul (1994) to produce

the following three equations for scoring the CBAA, where px is the probability

ascribed to for alternative x, n is a normalisation constant and k is a range constant.

Score if A is correct = n + k log2 (3 pA )

Score if B is correct = n + k log2 (3 pB )

Score if C is correct = n + k log2 (3 pC )

Diagram A: CBAA Triangle with the

scoring for each select area where A is

correct, n=62 and k=23.7

Diagram B: CBAA Triangle with the

scoring for each select area where A is

correct, n=0 and k=63

Figure 4-1: Paul’s (1994) CBAA Triangle with the Corresponding Score for Each

Region

76

Diagram A in Figure 4.1 is Paul’s (1994) preferred scoring system where n=62 and

k=23.7 resulting in a student being fully confident in the correct answer receives 100

while demonstrating full confidence in an incorrect choice is 0. There are varying

scores used for registration of confidence in between absolute certainty that have

assigned rewards accordingly.

The second scoring option demonstrated in Figure 4-1 Diagram B, where n=0 and

k=63, is an alternative version based on the same formulas that produces a scoring

regime that imposes negative scores for registration of high confidence in an incorrect

answer, however this is not used for fear by the instructors of the repercussions that

could occur by disgruntled students.

There are a number of concerns that this research has with Paul’s (1994) method of

scoring. The first is that a student who admits that they do not know the difference

between the given options, choosing to sit in the middle, is rewarded with a healthy 63

as demonstrated in Diagram A in Figure 4-1, meaning that a candidate can pass an

exam with no knowledge. The alternative and unused scoring suggested by Paul is

demonstrated in Diagram B in Figure 4-1, where n=0 and k=63. This scoring method

does address this concern to a certain degree as the scoring scale is shifted, where the

reward for being fully confident in a correct choice is 100 and being fully confident in

an incorrect choice is -150. This alternative scoring rewards a 0 for the student who

declares that they cannot choose between the answers, deemed by this research to be

fairer than awarding the 63 as described above. It also disproportionally penalises the

candidate for demonstrating a high confidence on the incorrect answer, a score of -150

for 100 per cent confidence for an incorrect answer, addressing the issue of

miscalibration of knowledge. Paul (1994) supports the use of the non-penalising option;

(Diagram A in Figure 4-1) over the penalising option (Diagram B in Figure 4-1), by

arguing that the CBAA was designed to acknowledge partial knowledge as its primary

objective. This research prefers a scoring method that incorporates the negative

penalisation to combat the registration of high confidence in the wrong answer, as

Gardner-Medwin (2006) refers to as “delusionary”, while still rewarding for partial

knowledge.

77

The author of this research considers the regions of Paul’s (1994) CBAA to appear be

cluttered at the vertices. The design of the user interface could be difficult to navigate,

requiring a minimal operational level of dexterity possibly increasing the cognitive load

on the participant.

Another concern for the CBAA scoring mechanism is the level of complexity, as the

method of score calculation is beyond the comprehension of many of the students who

use it. Any scoring system used should be in direct control of the user. As a student

navigates the interactive area in an attempt to register their confidence they are using

their cognitive mapping skills, moving in a linear path. This proportional linear

mapping should assume a proportional score depending on their physical distance from

the various options. This is not the case in Paul’s (1994) logarithmic scoring. A student

cannot be assured that they will receive a score that is directly linearly proportional to

their positioning on the triangle.

Paul’s (1994) and later Klinger’s (1997) interactive triangular response spaces offer

testing environments that are reliant on moderately high levels of dexterity and a good

grasp of physical spatial interpretation. The cognitive process of registering a level of

confidence is difficult to emulate through a mapping exercise where the student is

required to move a cursor on a blank field to register their confidence. This operational

method assumes that all students would proportionally position the cursor in the same

place for the same confidence. As discussed above this research questions the reliability

of this process, as the operational exercise of students with regards to psychological

mapping varies greatly from individual to individual. What one student considers a

cursor position to represent low confidence might be a registration of medium

confidence to another. This is an unreliable method to precisely register confidence and

could be misleading. This is not such a great concern for formative assessment as the

feedback is generally interpreted by the individual who completes the task and can be

adjusted according to their propensity towards registering confidence, however for

summative assessment where the exercise is more critical to the student’s profile it is

unacceptable. In Paul’s (1994) defense he does apply a correction to the student’s score

by generating a realism function based on the relative frequencies of confidence

registered by the individual over a period of time. This function is then applied to the

78

score adjusting it depending on the individual’s propensity to showing high or low

confidence.

Paul (1994) also acknowledges the concern of students inability to consistently register

their confidence and counter acts the negative perception that it may have by supplying

meaningful video demonstrations that link the registered probabilities and

consequential scores to common phrases of belief, such as “Probably A, Possibly C,

definitely not B”.

The work of Gardner-Medwin and Gahan (2003) and Gardner-Medwin (2006) is a

more recent contributor to the area of confidence assessment and has developed his

Confidence Based Assessment (CBA) strategy. His discipline of application is in the

medical educational field in which the value of confidence recognition and

acknowledgement of confidence is critical in the daily practice in the medical arena. At

present his system is designed for a True/False response to a single statement. This is

different from the MCQ format considered for this research, as demonstrated by Pollard

(1985,1986,1993), Klinger (1997) and Paul (1994), with the single stem question

followed by four or three answer options. However, the MCQ format proposed by the

research does permit the use of multiple correct answers, which makes it a cluster of

True/False questions under the single stem, with each of the options requiring a

statement of confidence, as included in Gardner-Medwin’s (2006) system. It is

therefore applicable at the single answer level and shall be discussed with this in mind.

Gardner-Medwin’s (2006) CBA was introduced in Chapter 3, where it was

demonstrated that he uses a negative scoring system designed to eliminate guessing,

recognise partial knowledge and in particular, severely penalise student responses that

show high confidence in incorrect answers. Gardner-Medwin (2006) classifies student’s

state of knowledge on any given area as either very confident [C=3 (80-100%)], fairly

sure [C=2 (67-79%)] or not sure at all [C=1 (0-66%)], using a scoring system that is

“proper”, rewarding a student accordingly for demonstrating their true beliefs and

being truly honest. He argues that a scoring system should use incentives to encourage

the participants to expose their real state of knowledge. It is for this reason that he

introduces a safe zone for students who are not confident of their knowledge on a

particular area by creating a non-penalising area for low confidence in an incorrect

79

answer. In contrast there is a double negative penalty score for high confidence in an

incorrect answer. This is demonstrated in Table 4-6 with all of the other score options

used.

UCL Confidence-based scoring scheme

Confidence Level 1 2 3

Score if Correct 1 2 3

Score if incorrect 0 -2 -6

Probability correct < 67% >67% >80%

Table 4-6: CBA Scoring System for Correct and Incorrect

Answers. (Gardner-Medwin and Gahan, 2003)

Unlike Pollard (1985,1986,1993) and Paul (1994), Gardner-Medwin and Gahan (2003)

supports his argument for choosing his scoring method with a series of graphs,

demonstrating the optimal path for a student to maximise their score with the

knowledge that they have. As the graph in Figure 4-2 demonstrates, for each possible

confidence level the expected average score depends on the probability of getting it

right. The CBA Scoring system is shown with each of the 3 levels of confidence C1, C2

& C3.

Figure 4-2: CBAA Scores for C1, C2 &C3. (Gardner-Medwin, 2003)

80

As can be seen the optimal path encourages the student to register C=3 for anything

with a confidence greater than 80 per cent, C=2 for a middle level of confidence (from

67%-79%) and once there is any doubt a register of C=1, as it carries no penalty. This

is in line with the upper bounds of the graph in Figure 4-2.

Gardner-Medwin (2006) argues that a crucial feature of confidence-based marking

systems is for them to have a motivating nature. He expressed concerned that many of

the marking systems that concentrate on rewarding highly those with high confidence,

such as Davies (2005), only rewards those students that are bold or perceptive enough

to see that it is never advantageous to register low confidence. One of the main

challenges for students when using the CBA for the first time is the realization that they

can be rewarded for low confidence in a correct answer, re- enforcing that honest

expression of confidence is a highly valued communication attribute in all areas. It is

for this reason that he promotes the use of the negative marking scheme but emphasises

the need for it to be motivating, as if it is not motivating it would be irrational for a

student to behave in a truly honest manner. He supports his argument by referring to the

work of others as presented in Figure 4-3.

Figure 4-3: Other Scoring Options used Including Scheme A from Hassmen &

Hunt (1994) and Schemes B-D from Davies (2005). (Gardner-Medwin, 2006)

81

It can be seen from these graphs that the optimal path varies substantially, depending on

the scoring method adopted. In the first option with no negative marking, it is never

rational to omit an answer and instructors often inform their students to leave no

question unanswered. The second option shows the equally balanced scoring of both

positive and negative which is encouraging for MCQ type questions but not for

True/False. In the case of True/False it promotes the omission of an answer when

confidence is less than 50 per cent, which can be detrimental to the student who has

some partial knowledge and is not prepared to register anything. However, Gardner-

Medwin (2006) acknowledges that the use of the negative marking is better than none.

The third option, Scheme A, attributed to Hassmen and Hunt (1994), has five levels of

confidence with negative marking used. Its greatest penalty for high confidence in a

wrong answer (-120) is greater than its equivalent for high confidence in a correct

answer (100) and also incorporates a safe zone for low confidence in an incorrect

answer (Hassman & Hunt, 1994). Gardner-Medwin (2006) classifies this as properly

motivating but has concerns in relation to the additional lower levels of confidence not

be rationally assigned for True/False at P< .35. Option four from Davies (2005) has 3

set levels of confidence that are equally rewarded for the negative and positive scoring

system, which is similar to the balanced scoring system preferred by this research.

Gardner-Medwin (2006) has voiced concerns with this approach, as he considers it

beneficial for the student to choose “no reply” for any confidence less than 50 per cent.

The final 2 options are again attributed to Davies (2005). Davies argues that students

who demonstrate high confidence in a correct answer should be greatly rewarded (4-5),

disproportional to registering high confidence in an incorrect answer (-2). His grading

system is reverse to that of Gardner-Medwin (2006) who penalises heavily for high

confidence in incorrect answers (-6) compared to the score for high confidence in a

correct answer (3).

Gardner-Medwin (2006) also calls on the work of Shannon’s (1948) theory of

information where he investigates the relationship between the scores and the

appropriate information-theoretic measure of lack of knowledge for a True/False

question, “proportional to the log of the subjective probability assigned to the correct

truth value for a proposition”.

82

This research acknowledges the contribution of Gardner-Medwin’s (2006) work but

has reservations in adopting a scoring system that uses such severe penalties for high

confidence in an incorrect answer, double negative the value of high confidence in a

correct answer. In rejecting Gardner-Medwin and Gahan (2003) and Gardner-Medwin’s

(2006) scoring technique this research does not intend on down grading his perceived

concern of students demonstrating high confidence in an incorrect answer. It is felt that

the students could consider this severe penalty scoring as being too unfair, even though

he softens the effect by offering the safe zone for low confidence in incorrect answers

(score of 0). Gardner-Medwin and Gahan (2003) and Gardner-Medwin (2006) argue

that his students quickly ascertain a technique for using the system that optimises their

score, which has been criticised as another way of learning ‘how to do the test” rather

than learning the content in the test. However, his basis for scoring does address this

issue as the scoring being “proper” and depends directly on the level of knowledge, so

even though the candidate can use an optimal method they must still base their

decisions on their knowledge.

The work of Gardner-Medwin and Gahan (2003) and Gardner-Medwin (2006)

is geared towards medical students, where he imposes a severe penalty for high

confidence in incorrect answers. Even though the above discussion acknowledges his

justification, both educationally and mathematically, it was felt that his approach was

too extreme and would not be well received by the students, being considered to be too

threatening to a population of students with average intelligence. In contrast, Davies

(2005) promoted a similar scoring system based on the reverse of Gardner-Medwin and

Gahan (2003) and Gardner-Medwin’s (2006), disproportionally over-rewarding for

correct responses with high confidence, which this research considers to be too lenient.

4.4 Comparison of an Incremental Balanced Scoring Method to

Previous Work

This section describes and justifies the scoring adopted for this research, building on

the previous work as discussed. While the functionality and application is often

83

different to those outlined above the underlying assumptions and arguments are

fundamentally the same.

This research considers application of assessment with confidence measurement that

incorporated confidence measurement as the determining factor in grading. As Pollard

(1985,1986,1993), Pollard and Clark (1989), Klinger (1997), Paul (1994), Gardner-

Medwin and Gahan (2003) and Gardner-Medwin (2006) have asserted, there is a need

for innovative assessment strategies that tackle the issues of guessing, reward partial

knowledge and encourage honesty regarding the state of a student’s knowledge. In

addition, a key requirement of this research was to design a scoring system that was

easy for the student to comprehend, placing the responsibility of learning into the hands

of the learner. It was felt that many of the previous systems, such as those proposed by

Pollard (1985,1986,1993), Pollard and Clark (1989), Paul (1994) and Klinger (1997)

required complex calculations by the student to ascertain the consequences of their

actions during the assessment. Another influential factor in the identification of an

appropriate scoring regime was to present the student with a relatively non-threatening

environment, in which the student could engage with the assessment exercise without

an over-bearing fear of the consequences of their actions, encouraging exploratory

behaviour with the system as part of the reinforcement of their learning.

Initial examination encouraged this research to investigate a simple linear approach to

the scoring, where the scoring system was analogous to a “betting” game. A student

was encouraged to place a wager (bet) on their answer depending on how confident

they were. The interface evolved to support this notion, presenting itself as a “game”

where the players (students) were presented with a question with four possible

solutions, in some cases with multiple possible correct answers.

The scoring technique preferred by this research is summarised in Table 4-7. For

convenience of discussing the scoring only increments of 10 are considered, although

the system permits the student to register any confidence measurement in an increment

of 1. The design of this application of assessment with confidence was to include a

granular registration of confidence, not as a method to increase the perceived

discrimination between the student’s state of knowledge but to increase the casual use

of the system as part of the appeal to the student, very much in accordance to the role of

84

intrinsic motivation in Hede’s (2002) Integrated Model of Multimedia Effects on

Learning as discussed in Chapter 3.

It can be seen from Table 4-7 that a student with high confidence in an answer would

gain the most by registering (betting) at a high level. They also quickly determine that

they would be equally penalised if the answer were incorrect. Similarly, they have the

opportunity to collect smaller scores for answers for which they have partial knowledge

and minimise their losses if incorrect. This adopted balanced scoring technique

discourages guessing, a primary objective of this research, while offering some rewards

for demonstrating partial knowledge.

Registered Confidence for

each option (Increments of

10) [pi]

Score if Correct

[si]

Score if Incorrect

[-si]

100%

90%

80%

70%

60%

50%

40%

30%

20%

10%

0%

1.0

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0.0

-1.0

-0.9

-0.8

-0.7

-0.6

-0.5

-0.4

-0.3

-0.2

-0.1

0.0

Table 4-7: Balanced Scoring Registered Confidence for Correct and Incorrect

Answers.

Figure 4-4 presents two graphs representing the balanced scoring method for some of

the situations outlined in Table 4-7, furthermore indicating the optimal path as used by

Gardner-Medwin (2006) to support his argument for the CBA scoring.

While Gardner considers this balanced scoring system to be “proper”, he considers it

not to be “motivating”, as the optimal path contains the “no reply” option if unsure of

85

the answer, encouraging a student with any doubt of the answer (less than the 50/50

chance) not to respond for fear of penalty. This is a concern discussed in Section 4.5 as

part of the evaluation of the suitability of the scoring technique to the field of

application.

Diagram 1: Balanced Scoring Method

Option

Diagram 2: Overlay of Optional Path

Figure 4-4: MCQCM Scoring with Optimal Path.

When discussing scoring options it is helpful to investigate the expected values, in this

case the Expected Profit, when considering probability theory in what is essentially a

waging situation. The Expected (Score) for N trials is based on the following formula

Expected (Score) = N (pi) ( si) + N (1-pi) (- si)

With pi as the registered confidence for that instance (or probability), i is from 0 to 100

(increments of 10), s is the calculated score where the maximum possible score is s=1.

NB: N (pi) ( si) calculates the winning component while N (1-pi) (- si) calculates the

expected loses.

Hence a student who registers a confidence of 70 per cent could expect to yield the

following expected score for 100 trials.

86

Expected (Score for p=0.7) = 100 (0.7)(0.7) + 100(1-0.7)(-0 .7)

= 100(.49) – 100(.21)

= 28

OR 0.28 for the average score.

An interesting comparison is the Expected Profit for Gardner-Medwin’s (2006) CBA,

based on the same formula with the variation for the score (.66, converted to be in the

same range of -2 to 1). In this case a student with a confidence level of 70 per cent

would be at the C=2 level, score .66 for correct and -.66 for incorrect). It can be noted

that even though Gardner-Medwin (2006) only uses 3 levels of confidence, C=3, C=2,

C=1 the E (score) calculations will be in increments of 10 per cent, as for the MCQCM

calculations. It is justifiably assumed that a student using a system will record the

appropriate C level of confidence but will have a designated numerical level of

confidence when using the application for any given question. For example a registered

level of C = 3 could have a student’s operational level of 90 per cent or 80 per cent, and

so on.

Expected (Score) = N (pi) ( si) + N (1-pi) (- si)

Hence Expected (Score for p=0.7) = 100 (0.7)(0.66) + 100(1-0.7)(-0.66)

= 100(.462) – 100(.198)

= 26.4

OR 0.264 for the average score.

The Expected values for all probabilities in increments of 10 per cent are shown in

Table 4-8.

The comparison is best demonstrated by the graph shown in Figure 4-5, where the

Expected Score, or E(Gain) for both systems is closely aligned, with the exception of

the E(Score) for the balanced scoring method, which generates negative values for the

lower levels of recorded confidence. This comparable variation is due to Gardner-

87

Medwin’s (2006) safe zone, where a candidate is not penalised for admitting very little

confidence, attributed to his perceived motivating approach to scoring.

Student Confidence MCQCM Expected(Score) Gardner Medwin’s CBA

Expected(Score)

100% 1.0 1.0

90% 0.72 0.7

80% 0.48 0.4

70% 0.28 0.27

60% 0.12 0.2

50% 0.00 0.17

40% - 0.08 0.13

30% -0.12 0.1

20% -0.12 0.07

10% -0.08 0.03

0 0 0

Table 4-8: The Average Expected Scores from MCQCM and CBA.

Figure 4-5: Graph Comparing the MCQCM and CBA Expected Scores.

88

It is observed that the expected gains from the higher levels of confidence registration

are not noticeably different for the two systems, even though Gardner-Medwin’s

(2006) students receive a severe penalty if their answer is wrong. It is comforting to see

that the MCQCM’s E(Score) sits well in comparison to Gardner-Medwin’s (2006)

CBA score.

Investigating the Expected score or gain when a student guesses during a test is

common practice when considering scoring methods. The MCQCM permits one or

more correct answers depending on the choice of the instructors. During this research it

was observed that instructors tended to produce a mixture of single and multiple correct

answers. In all cases the wording of the questions identified them as being single or

multiple correct answer questions, which assisted the student. Consequently the

MCQCM assessment exercises are clusters of true/false answer questions with one

stem, which generates numerous combinations of possible outcomes. While it is

unrealistic to cover all of them here a few fundamental sample responses will be

considered for comparison to the traditional approach to MCQ scoring.

Firstly, the standard Multiple-choice Question (MCQ) with four options, one correct

answer and no penalties for incorrect answers has the E(Score) calculated by

E(X) = 0.25(1) +0.75(0) =0.25

This means that there is a 1:4 chance of the student picking the correct option in which

they are awarded the score of 1 and a 3:4 chance that they will select an incorrect

answer and receive 0. This value is acceptable by many instructors when implementing

MCQ tests. As previously stated, this study does not accept this proposition and this

research’s major objective is to eliminate the noise caused by this activity.

A simple anti-guessing strategy with a correct answer being awarded a score of 1 and

an incorrect answer a score of -1/n-1, where n is the number of options, has the

decidedly modified result.

E(X) = 0.25(1) +0.75(-1/3) = 0

This system is a suitable deterrent to guessing as it adjusts the outcome significantly.

An instructor using this type of scoring option would not encourage their students to

guess during the test unless they were reasonably confident.

89

Paul (1994), Pollard (1985,1986,1993), Pollard and Clark (1989), Davies (2005),

Gardner-Medwin and Clark (2003) and Gardner-Medwin (2006) all accept this

approach to various degrees. While it is not their individually preferred option, it is

acknowledged by all that it at least addresses the issue of guessing, which is preferable

to ignoring the concern. The problem with the simple anti-guessing balanced negative

scoring described above is that it does not encourage the student to express their true

state of knowledge, hence is not motivating.

The balanced scoring method with increments of confidence introduces another

dimension into the arena, in that the concept of wagering adds another layer in the

operation. A student has the option of using an educated guess, tapping in to the partial

knowledge that they have, hopefully minimising the impact of an incorrect choice but

equally important creating the opportunity to secure some marks. The balance of the

reward with the penalty minimises the fear of registering a nominal value to reflect

their confidence in their choice.

The variation to the expected gain for the standard MCQ question as considered above

for a student who has a medium level of confidence, say 60 per cent in an answer, “I

think that it is this one”, for the one option would be

E(X) = 0.25(.60) +0.75(-.60) = -0.30

To add to this they can also use a combination of confidence to offset the negative

component if required as the incremented balanced scoring method is a cluster of

True/False questions that permit the student to register their confidence for each given

option.

A simple example to consider is as follows.

In this case a student is relatively confident (80%) in the first option but also considers

option 2 to have merit (at say 55%), the student not as confident as with option one.

Option 3 they consider to be incorrect with a high level of confidence (100%) and

option 4 they have a reasonably high level of confidence (70%) of being incorrect but

cannot completely dismiss it.

90

If option 2 is correct and the others are incorrect then the score would be

Score = -0.8 + 0.55 + 1.0 + 0.7 = 1.45 out of a possible 4 (total of 1 per option)

= .3625

In comparison, the score for a guess for the standard balanced negative score would

require them to only nominate option 1 as their most preferred answer giving a score of

-1/3.

The expected value for a student with these levels of confidence in their answers would

be:

E(X) = .8(.80) +0.2(-.80) + .55(.55) +0.45(-.55) + 1(1) +0(-1) + .7(.7) +.3(-.7)

= .48 + -.055 + 1 + .28 = 1.715

The score for the question is out of a total of 4, 1 for each, giving an expected return

score of

E(X) = 1.21/4 = .427

Given that the student was only moderately confident in the correct answer (option 2)

and having various levels of confidence in 2 incorrect answers, option 1 and option 4

the grades should be less than a pass. This is a pleasing outcome as a more

representative result for the student’s level of knowledge.

This simple demonstration shows that the incremental balanced scoring method is a

proper scoring system, as Paul (1994) and Gardner-Medwin (2006) promote, as it

rewards the participant proportionally to their level of stated knowledge, whilst

permitting and encouraging the demonstration of partial knowledge.

4.5 Choice and Justification of Scoring Method for this Research

The decision to use the proportionally incremental balanced scoring technique was

made for the following reasons.

Firstly, it was deemed important that the scoring calculations remain simple and in the

direct control of the student. Their actions should result in a consequential score, which

would not confuse or surprise the student. In this case, the concept of laying down a

91

bet, as a means of supporting your choice, was both playful and properly rewarding, as

Paul (1994) and Gardner-Medwin (2006) purport. The use of logarithmic functions and

disproportional penalising can confuse and create too much pressure on the candidate

causing adverse negative effects on the final outcome. As Sim, Read and Holifield

(2008) advocate, the student has the most to lose when sitting the test, hence it is only

fair that they have the control. The comparative analysis of the expected values of the

incremental balanced scoring method to other more complicated methods demonstrate

that the expected outcomes are not significantly different. In addition the author of this

research has concerns about the notion that the measurement of knowledge has the

same traits and attributes as the measurement of information. Paul (1994), Gardner-

Medwin (2006) and Klinger (1997) leverage heavily off the work of Brown and

Shufford (1973) to strengthen their argument, which is based on Shannon’s Information

Theory (1948). Gardner-Medwin (2006) considers the compliance of his scoring to

Shannon’s Information Theory comforting but of the least importance when

considering the constraints that it should adhere to. The author of this research feels the

same, as the relationship between knowledge and information is complex.

Secondly, even though the incremental balanced negative scoring is based on a proper

scoring system that addresses the area of guessing and recognises partial knowledge, it

has been criticised for not being truly motivating. This criticism is valid in that the

optimal path analysis (See Figure 4.4) encourages the option of choosing not to answer

for low confidence, yielding no penalty. However, the students also appreciate the fact

that no gain can be made by abstaining from the activity, which is apparent from the

implementation phase an issue to be further discussed in the Chapters 5, 7 and 8, hence

the argument of students choosing to refrain from committing to answers is not likely.

Finally, the notion of unfair consequences must be considered, as Sim, Read and

Holifield (2008) argue strongly that any CAA package must be seen to be fair in its

application, not providing the student any grounds for appeal. This notion will be

further discussed in Section 6.4.3. The student’s perception of a scoring regime and its

purported fairness is a deciding factor of the choice. The harsh penalising of Gardner-

Medwin (2006), for high confidence (C=3) for an incorrect answer of double the

negative value (-6), although legitimately argued and supported by mathematical

92

modeling and probability theory, is often deemed too severe by both the instructors and

the students. The scoring system by Hassmen & Hunt (1994) referred to in Figure 4-2

might be less threatening. Paul (1994) also produced an alternative method where the

penalty score was a multiple of negative 1.5 of the positive score for a student’s

response demonstrating high confidence in an incorrect answer, but did not promote it

as heavily in his paper. In recent discussions with Paul he explained his position on the

published choice with a detailed synopsis of the justification. He considered the choice

of constants is arbitrary and felt that many educators are most familiar and comfortable

with 0 to 100 scoring. Likewise, many educators are uncomfortable with awarding

positive scores for ignorance (represented by the center region) so the second example

addresses this concern by establishing a score of 0 for that case and 100 for certainty

(which results in the -150 for "completely misinformed").

This conforms with the incremental balanced scoring approach of scoring 0 for no

knowledge, with additional elements similar to Gardner-Medwin’s (2006) severity of

penalty for “completely misinformed”. Gardner-Medwin (2006) recently stated in a

conversation with the author that we fail as teachers if we mark a lucky guess as if it

was knowledge and we also fail if we mark confident errors as if they were no worse

than acknowledged ignorance.

On the adoption of penalising students with negative scores for high confidence in

incorrect answers Paul (1994) further argues that in limited field trials students seemed

quite robust and were able to use the system effectively regardless of the specific

values, unlike teachers and professors who appear to have the most difficulty adapting

to this type of knowledge assessment in the context of their existing administrative and

logistical environs. This observation is shared by Gardner-Medwin (2006), who

declared during a recent Computer Assisted Assessment post conference focus group

his level of frustration in the slow uptake of innovative scoring systems, as there is a

reticence to apply penalties, which in his opinion is an irresponsible approach in

educating students.

Paul through an email discussion concedes that his alternative scoring regimes for

various n and k values have merit, furthermore he intends on pursuing further research

and trials to ascertain the most appropriate method for different applications, both

93

summative and formative. However, his primary focus was to develop a supportive

self assessment tool that adopted the balanced approach to encourage students to

participate in a non-threatening environment, as is the major objective of this research.

4.6 Summary

This chapter has summarised research demonstrating the benefits of an incremental

balanced scoring mechanism that proportionally rewards and penalises students for

correct and incorrect answers, in order to have a richer understanding of the student’s

knowledge, to the benefit of both the student and the instructor.

The incremental balanced scoring method adopted in this study is based on the

arguments and opinions cited in this chapter. It pays homage to the extensive work of

Pollard (198519861993), Pollard and Clark (1989), Paul (1994), Klinger (1997), Davies

(2005), Gardner-Medwin and Gahan (2003), Gardner-Medwin (2006) and the

contribution of others. It especially leverages off Paul’s (1994) declared objective of

using a scoring technique that provides a supportive and non-threatening environment,

given that the initial objective of this research was to develop an assessment strategy to

be used at the discretion of the student and the instructor, depending on its application.

This research does not exclude consideration of the more severe scoring mechanism

proposed by Gardner-Medwin (2006) supporting the recent proposition tabled at an

assessment focus group, that it is irresponsible for an educator to positively reinforce

high confidence in an incorrect answer.

Considering the reticence of instructors to adopt the severe negative penalty option, as

Paul refers to previously, this research uses a scoring system that is more palatable than

that of Gardner-Medwin (2006).

Chapter 5 will present the first iteration of an MCQ assessment tool developed for this

research, that incorporates the incremental balanced scoring method as demonstrated

and justified in this chapter. It will describe the first prototype and the initial pilot

programs designed to ascertain the value of the assessment with confidence

measurement as perceived by both the instructors and students.

94

CHAPTER 5 DEVELOPMENT OF THE MULTIPLE-CHOICE QUESTIONS WITH CONFIDENCE MEASUREMENT (MCQCM) PROTOTYPE AND PILOT PROGRAM

Chapter 4 discussed possible scoring options, nominating the balanced scoring method

based on the incremental levels of confidence measurement for implementation. This

Chapter introduces the tool that was developed to answer the questions posed in

Chapter 2, that is the Multiple-choice Questions with Confidence Measurement

(MCQCM) assessment tool. The initial Visual Basic design of the MCQCM was part of

a previous study for a masters degree. It documents its preliminary development stages

through an iterative evolutionary process, starting with the implementation of a

fundamental working prototype to a group of students and instructors, analysing their

responses to the system and their perceptions of its contribution to the learning and

instructional process. It culminates with some functionality and design

recommendations to improve the MCQCM for extended application.

95

5.1 The MCQCM

It is appropriate at this time to introduce the Multiple-choice Questions with

Confidence Measurement (MCQCM) assessment tool prototype in order to establish a

general understanding of the fundamental assessment principles upon which the

MCQCM is built. The MCQCM version introduced here is the result of an evolutionary

design process of development, as described in the research methodology in Chapter 2,

incorporating the HCI user centered design iterative approach, eventually culminating

in the fully operable Internet based MCQCM to be described in Chapter 6.

The general structure of the MCQCM self-assessment tool is based on the traditional

Multiple-choice Questions format as outlined by Kehoe, Frary, Rodriguez, Tarrant,

consisting of a stem with a number of options (Frary, 1985; Kehoe, 1995; Rodriguez,

2005; Tarrant et al., 2009). It was imperative that the designed system would be easy to

use without placing too much cognitive demand on the user, ensuring that their efforts

were placed on the question rather than the interface. It was also considered important

that the scoring technique was simple, as discussed in Chapter 4, placing the user in

control of the results of their actions. The resulting self-assessment tool would be

required to be developed adhering to good usability design principles (Sharp et al.,

2007), engaging the user while exercising sound navigational properties for delivery

across the Internet, intranet or stand alone. There was the additional requirement that

the system be able to capture and record the scores of the students as they participated

in the exercise for formative and summative assessment.

The MCQCM system is to permit more than one correct answer. This encourages the

student to consider all options separately and not to identify what they consider to be

the single correct answer and ignore the rest or to use a process of elimination. The

system’s feedback to the students is required to be a simple reflection of the student’s

present understanding of the concept being considered in each question. The

advantages considered by this approach compared to the traditional MCQ format are as

follows:

• To permit the instructor to word the options to closely examine the areas of

study, eliminating the need to use easily recognisable distracters.

96

• To force the student to consider all options carefully, increasing their exposure

to associated areas within the topic.

• The score achieved is to reflect an honest position of the student in their

knowledge of the subject.

• To provide formative feedback to allow both students and instructors to redirect

attention where required during the learning process.

5.2 Design of the Rudimentary MCQCM Prototype

At this early stage of development a rudimentary prototype of the MCQCM was used,

far more simplistic than the Web-based operational version to be described in Chapter

6. The initial version of the MCQCM, see Figure 5-1, was developed to cater for these

initial trials.

Figure 5.1: The MCQCM Prototype Developed to Run the Initial Trials.

The MCQCM initial Visual Basic prototype was designed to reflect the student’s level

of understanding of topics as precisely as possible. As this development was part of a

previous study for a masters degree only a synopsis will be covered here. The student is

required to clearly state their level of confidence for each of the answers offered for a

question, knowing that they would be proportionally penalised for an incorrect choice

and proportionally rewarded for a correct choice.

97

The scoring method was adopted after extensive consideration of previous research as

discussed in Chapter 4. A conclusion of this investigation into previous work was that

in order to enable a richer understanding of the student’s knowledge a scoring method

that proportionally rewarded and penalised a student for a correct and incorrect answer

were the preferred choice. The main arguments for this decision was to develop a fairer

system for the student under their control while offering comparable expected outcomes

to other scoring regimes. It was felt that the MCQCM should offer a self-assessment

service that is honest, informative, and directional whilst still being palatable to the

student. The resulting score for a question is calculated dependent directly on a

student’s registered level of confidence for each option, using both positive and

negative values. This is briefly explained in Table 5-1 below with some simple

examples.

Confidence registered for an option Example of score calculation from a

registered confidence

A high level of confidence for a

correct answer for an option yields a

high positive score

A confidence level of 90 per cent for a

correct answer yields a score of

positive 9/10.

i.e. +9/10

A high level of confidence for an

incorrect answer for an option yields a

negative score of equal value.

A confidence level of 90 per cent for

an incorrect answer yields a score of

negative 9/10.

i.e. -9/10

A low level of confidence for a correct

answer for an option yields a low

positive score.

A confidence level of 20 per cent for a

correct answer yields a score of

positive 2/10.

i.e. +2/10

A low level of confidence for an

incorrect answer for an option yields a

negative score of equal value.

A confidence level of 20 per cent for

an incorrect answer yields a score of

negative 2/10.

i.e. -2/10

Table 5-1: Rules and Example of a Score for a Given Scenario.

98

The individual scores for each option allocated out of 10 are tallied to give a score for

each of four questions, resulting in a score out of 40. Each score is displayed as a value

from 0 to 10, or the negative equivalent. As an example, Table 5-2 demonstrates the

resulting scores for a student’s answer to a question with a single correct answer as

highlighted.

Option Instructor’s

Choice

Student’s

Choice

Correct or

Incorrect

Confidence Score

A: i-- False True Incorrect 65 -6.5

B: i++ True True Correct 90 9

C: i=1 False False Correct 100 10

D:i=i++1 False False Correct 92 9.2

Table 5-2: Resulting Score for Options Given the Student’s Choice and Their

Registered Level of Confidence.

The resulting final score for this question is calculated by the addition of the scores for

each option: Option 1+ Option 2 + Option 3 + Option 4 = Total

-6.5 + 9 + 10 + 9.2 = 31.7/40.

In this case the student has incorrectly nominated Option 1 as correct with a 65 per cent

level of confidence. However, they have also correctly identified Option 2 as the

correct answer with 90 per cent level of confidence and further identified correctly the

incorrect answers (Options 3 and 4) with high levels of confidence.

The final test score is calculated by summing each question’s score, as is the normal

practice.

The MCQCM permits the instructor to nominate one or more correct answers if

desired, significantly increasing the level of difficulty for the student to identify correct

and incorrect answers for every question. An example of a question with multiple

answers might be as demonstrated in Table 5-3 below, where both B and C are correct

answers highlighted in the table.

99

Option Instructor’s

Choice

Student’s

Choice

Correct or

Incorrect

Confidence Score

A: i-- False True Incorrect 60 -6.0

B: i++ True True Correct 80 8

C::i=i+1 True True Correct 92 9.2

D:i=i++1 False False Correct 100 10

Table 5-3: Example of a Question, which has 2 Correct Answers B and C.

In this case the student incorrectly identified A to be True with 60 per cent confidence

and correctly identified B to be True with 80 per cent confidence, C to be True with 92

per cent confidence and D to be False with 100 per cent confidence.

Critical to the success of the student’s MCQCM experience was the requirement to

train them in understanding the scoring method. This was achieved by supplying online

interactive demonstrations using the scoring calculator to be used in class. This device

was specifically created to assist the students in understanding the scoring mechanism

of the MCQCM. It permitted the student to simulate the possible scoring scenarios to

see the resulting changes in the scores.

The possible score for a question where a student correctly identifies 3 of the 4 options

is as shown in the score calculator in Figure 5-2, based on the response from the student

demonstrated in Table 5-3.

Figure 5-2: Scoring Calculator for MCQCM Table 5-3.

100

The final score and feedback were given to the student after they completed the test. In

this rudimentary prototype the scores for each question were reproduced on the screen

so the student could see the correct answers with their choices lined up beside them,

similar to the configuration shown in the scoring calculator in Figure 5-2.

5.3 Pilot Studies

Two initial pilot runs are documented here. The first involved 6 students and 3

instructors. The second pilot program conducted was a larger exercise, involving 93

participating students and 8 instructors. These activities produced some interesting and

encouraging qualitative and quantitative data demonstrating that the rudimentary

MCQCM tool had promise as a positive contributor to both the instructor, holding the

primary leadership role, and student as they journey together along the learning path. In

addition these initial findings encouraged further development and studies, as

documented in Chapters 7 and 8.

5.3.1 Aims of Pilot Studies

It was considered beneficial to involve various classifications of stakeholders as part of

the needs analysis and low fidelity user evaluation. The identified participants

consisted of a number of students, instructors, the designer and programmers, as the

designing of the operational interactive system was reliant on them for direction at this

early stage of development.

The activities were designed to elicit answers and discussions on the following

statements to assist in answering the research questions 1 and 3 as outlined in Section

2.2.

1. Was the system easy to operate?

2. Did the feedback display produce comprehensible information in order to be

valuable in directing the student along their learning path?

3. Is a scoring system that penalised for incorrect choices and rewarded for correct

choices in a linear proportionality easy to comprehend?

101

4. Would the participant actively use the sliding bar to register their level of

confidence freely and would they perceive the system as being either too

complicated or too threatening?

5. Would a self-testing program of this design favour a particular learning style?

6. Would students consider the proposed system might be more favourable to the

extraverted individual and disadvantage the introverted user?

7. Would students consider the proposed system to be gender bias?

5.3.2 First Pilot Study

The initial small pilot study main objective was to elicit as much user feedback as

possible from the student’s and instructor’s experience of using the basic MCQCM

prototype.

To optimise the effectiveness of the pilot run, Visual Basic was used to construct the

interface, creating individual data tables for each participant. This permitted the

responses from each of the participants to be captured and later displayed to students

and instructors as a means of reflecting on their progress and experiences.

A small group of 6 students (3 males and 3 females) and 3 instructors (2 males and 1

female) were asked to participate. After completing the test the students were required

to answer a series of questions in the presence of the designer. The subject instructors

were also interviewed after the students completed the tests to ascertain their opinions.

(Appendix A)

To ensure the richness of the information collected the participants were invited to

consider the system over a period of days and encouraged to give additional feedback

after ongoing reflection.

The 6 students were required to complete 5 test questions concentrating on a particular

content area, which was to be formally tested within their classes. Hence the students

perceived it as advantageous to their study and were consequently keen to participate.

To assist in the initial exposure to the system the students all participated in an

introductory session that demonstrated the package in a non-threatening manner. The

students were initially required to respond to a general question set in their social

102

environment where the question addressed a local, major, well-publicized, sporting

event. Immediately following this, the students were required to complete the further 5

questions designed to ascertain their knowledge of the nominated content area.

Student Observations

The initial introductory demonstration session permitted the students to participate in a

social context without threat, a practice encouraged by Paul (1994), Gardner-Medwin

(2006). This proved to be a successful exercise as they responded to the system in a

relaxed manner. However, it was observed that it did not eliminate their fears or

apprehension completely during the actual assessment exercise.

The students were then asked to complete the more formal part of the pilot program,

which was a series of 5 test questions constructed around the content area of their

studies.

All of the students completed the five test questions without any real operational

concerns. It was observed that during this formal part of the pilot program there was

still initial apprehension in using the system for the first time. The students approached

it with some suspicion and concern. From their verbal protocol during testing, it was

observed that they were not altogether comfortable with the interface, as they were not

familiar with it under test conditions. Students also identified the additional anxiety of

being closely observed during a test, and expressed concerns about being required to

identify not only what they considered to be the correct option but also what they

considered to be the incorrect answer. To add to their initial apprehension they

demonstrated hesitation registering their level of confidence in all of their choices.

In response to the questions posed in Section 5.3.1 in general the students found the

MCQCM easy to use, feeling comfortable with the scoring technique and the

operational aspects. They did however show concern about the sliding bar functionality

of registering both confidence and choice of answer.

All of the students requested more opportunities to use the test as they considered it to

be greatly beneficial in confirming their knowledge in some areas and highlighting their

inadequate knowledge in others.

103

All of the students registered that they understood the feedback, as it clearly stated their

responses.

The simple scoring system was well received. All of the students claimed to

understand the method of calculating the score and consequently would react depending

on their level of confidence to maximise their result.

It was observed that the students tended to minimise the use of the slide bar for the first

few questions and increased the usage for the remaining, where usage increased as the

student became more relaxed.

The pilot program was not broad enough to give any valid feedback pertaining to the

bias towards particular learning styles and gender. The students could neither

demonstrate nor comment on these issues during their short exposure. This is an area

for consideration at a later stage and with a larger cohort of students.

Table 5-4 provides a summary of both the students and instructors.

Instructor Observations

As the instructors exposure to the system at this early stage was only brief their

contributions were minimal, based on their observations of the prototype being used by

the student’s and exposures to a series of representative summary results for their

consideration, also included in Table 5-4 in a summary of comments from both the

students and instructors.

It can be observed that the feedback to the MCQCM prototype at this early stage was

promising being generally well received by the participants in the pilot program. All of

the students appreciated the opportunity to use the testing facility and they all

considered it to be beneficial to their preparation for the oncoming test. They

considered the method of scoring to encourage risk-taking and may also permit the

students to manipulate the system to their advantage.

At this early stage some students registered a concern that the process of decision-

making could confuse the participant. This was not overly apparent during the pilot

program but became a significant issue during later extended trials.

104

Initial Questions Student Observations and

Discussions

Instructor Observations

and Discussions

Was the system easily operable? Hesitation is using the slide bar, slight confusion in the sliding mechanism to register confidence and choice of answer

Appeared to be easily operable

Was the feedback display produced comprehensible in order to be valuable in directing the student along their learning path?

All students felt that the feedback was clear

Appeared to be clear to the student and the proposed reports to the instructor would be beneficial

Do you think that a scoring system that penalised for incorrect choices and rewarded for correct choices in a linear proportionality is easily comprehended?

The simplicity of the system was understood, the students did realise that not answering in some case would be beneficial

Appeared to be understood by the students, some hesitation in offering it as a legitimate scoring mechanism

Would the participant actively use the sliding bar to register their level of confidence freely and their general perception of the system being either too complicated or threatening?

Initially little use of the sliding bar, extended use as they progressed through. High level of initial apprehension alleviated as the test progressed

Concerns that might be too threatening having to identify correct option and also incorrect ones. Concerns about unfair consequences and possible appeals

Would a self-testing program of this design favour a particular learning style?

No real concern registered, no participant felt disadvantage when using the system

Could be more favourable to the student who prefers to learn by experiential methods, trying out different ways etc.

Do you consider the proposed system might be more favourable to the extraverted individual and disadvantage the introverted user?

No real concern registered, no participant felt disadvantage when using the system

The experience of the instructors was that this would suit some students more than others. They thought that the over-confident would overstate their confidence while the more timid would understate.

Would the system have gender bias towards males? Do the instructors in their personal experiences observe that males have a tendency to overstate their ability while females often understate?

Did not register any opinion Some of the instructors also thought that it might assist female students. As it gives them the opportunity to show levels of knowledge without fear of embarrassment

Table 5-4: Pilot Program Student and Instructor Observations.

105

Also apparent, are the concerns registered by the instructors, including the possible

favouritism towards the extraverted student and the fear of appeal for perceived unfair

penalties.

As a direct result of these observations and the discussion above, a second, more

extensive pilot program was carried out in an attempt to obtain a deeper understanding

of some of the cognitive processing issues flagged in this initial, smaller pilot study.

5.3.3 Second Pilot Study

The encouraging results of the small pilot study initiated a further, more comprehensive

pilot study. The primary objective of these studies was initially to develop and evaluate

the MCQCM as an innovative formative assessment tool, to determine if it is beneficial

to the student and the instructor, being the key identified stakeholders. These studies

were designed to ensure that the stakeholders were given the opportunity to interact

with the system at an operational level, producing both qualitative and quantitative data

for analysis and interpretation. This part of the research was approached in two closely

related stages, Stage 1, student evaluation and Stage 2, instructor evaluation.

Stage 1, the initial and major section of the experiment, was based on a series of trials

with two individual cohorts of students using the system as part of their learning

experience. All data was recorded either directly to a database or indirectly via the

subject review questionnaire. It was considered advantageous to collect the data by the

two means, as the encompassing technology is ideal for collecting the raw data and the

hand written questionnaire format permitted the students to respond away from the

computer environment, giving the opportunity for reflection and further thought.

Stage 2 of the experiment investigated the instructor’s evaluation of the system,

attempting to gauge their perceived value of the MCQCM as both a formative and

summative assessment tool. The second instructor’s experiment was directly dependent

on the student’s experience as the generated recorded data was analysed and presented

to the instructors for their opinions.

106

5.3.3.1 Stage 1: Method and Results for Student Focused Experiment

Two groups of students participated in the experiment to be referred to as Cohort 1 and

Cohort 2. The initial cohort of 50 students (Cohort 1) consisted of undergraduates

enrolled in the Tertiary and Vocational Training Education (TAFE) Computer Science

course. The second cohort of 43 postgraduate students (Cohort 2) was enrolled in the

Higher Education (HE) Graduate Diploma of Information Technology. The modules

being tested were core subjects of both courses being the TAFE Introduction to C++

and the HE Database 1 (Entity Relationship Modeling Design and Structured Query

Language). It was considered that testing the MCQCM at various levels of the

educational spectrum would enable richer data in relation to usability and perceived

usefulness of the system.

5.3.3.2 Outline for Cohort 1

The undergraduate TAFE students participated in the self-assessment exercise as part of

their preparation for a scheduled summative assessment task. The students were

encouraged to complete the assessment without peer consultation and were informed

that the test results would be anonymous. Each student was assigned a unique number

that referenced their scores, and responses to the post session questionnaire, giving

them complete anonymity.

The MCQ test consisted of 5 questions addressing the fundamental concepts of

programming in C++. In general the format was a stem that referred to particular

programming desired outcome with the options containing the program segments to

successfully achieve that desired outcome. Of the 10 questions 5 of these provided

more than one correct answer in the four options given. Of the 50 participating students

all responded to the posttest survey as it was completed in class as a part of the standard

testing review process.

5.3.3.3 Outline for Cohort 2

The second cohort of 43 HE postgraduate students used a slightly modified Web-based

version as part of their normal revision program in preparation for the final exam.

Unlike the undergraduate students, these students accessed the self-assessment test

from their preferred study environment, either in their homes, in the laboratories, at

107

work, or in the library. The structure of the questions were the same with a stem and 4

options from which to choose, with greater than 50 per cent of the questions containing

more than one correct answer. Of the 43 participating students 20 responded to the

optional post-test survey presented as part of the MCQCM, presented on the screen at

the final stage of the test, or available to them via the Internet at a later time.

5.3.3.4 The Post Test Questionnaire

The student questionnaire consisted of 3 general background questions regarding the

age, sex and computer experience of the participants. The questionnaire (Appendix A)

contained a further 9 questions relating directly to the MCQCM self assessment tool

addressing the following issues:

• Do the students accept the system as both a summative and formative

assessment tool?

• To what degree would they use the MCQCM via the Internet?

• Does the resulting feedback from the MCQCM have a direct influence on their

learning path?

• Do they feel well informed about their level of understanding of the subject and

the areas in need of revision after using the MCQCM?

• What is their opinion of the benefits and perceived problems with the MCQCM

system?

These questions were to contribute to the research questions 1 and 3 as outlined in

Section 2.2.

5.3.3.5 Data Collection for Stage 1: Student Focus

There were two distinct components of data collected by the MCQCM system from the

students. The first collected set of data was the actual score of the participants recorded

during the test and referenced by the unique identification number assigned to each

student. At the end of the test the recorded data was regenerated, presenting on the

screen a graphical display of the student’s scores for each question and a total score for

the test.

108

The second collection of data was both the quantitative and qualitative data from the

questionnaires completed by each student.

5.3.3.6 Analysis of Collected Data from Stage 1: Student Focus

This section of the analysis considers the demographics of the participants gathered

from the general background questions.

Cohort 1 of 50 students tested, consisted of 44 males and 6 females, with Cohort 2 of

43 students consisting of 31 males and 12 females, as demonstrated in Figure 5-3. This

gender imbalance can be attributed to the Computer Science course presently attracting

a substantially greater number of male participants.

Figure 5-3: Age and Gender Distributions for Both Cohorts of Students.

It is observed from the graph above that the greater proportion of Cohort 1

undergraduate students were in the age range of 18-25. This can be attributed to the fact

that the main feeder for this course is from the secondary education sector. Whereas,

the greater proportion of Cohort 2 postgraduate students are in the age group of 30 and

above, as it is a requirement for a student to have completed an undergraduate

qualification to be accepted into this particular course.

The level of computer experience was recorded as either being none, casual or

proficient, with 64 per cent of the students classifying themselves to be proficient and

the remaining 36 per cent classified themselves as being casual. There were no students

that registered their experience as being “None”. It would be expected that students of

Computer Science would classify themselves as being proficient, however it is

understandable that their perception will differ from student to student.

44 31

6 12

0 10 20 30 40 50 60

Cohort 1 Cohort 2

Gender Distribution of Cohorts

Females

Males 0% 20% 40% 60% 80%

Cohort's Age Distribution

Cohort 1

Cohort 2

109

5.3.3.7 Summary of Student’s Questions About MCQCM

In answer to the questions outlined in Section 5.3.3.4 all of the undergraduate Cohort 1

students registered an appreciation of the system at various levels as a valuable part of

their learning process, with 54 per cent registering the system to be approaching

extremely helpful. Similarly, 95 per cent of the Cohort 2, the postgraduates, considered

the self-assessment tool as valuable with 20 per cent registering it as extremely helpful.

Furthermore, a pleasing 100 per cent of Cohort 1, the undergraduate students and 98

per cent, i.e. all but one of Cohort 2 students, stated that they would use the system to

various degrees during their studies if it were available.

The students from both cohorts registered a desire for the system to be delivered via the

Internet, stating that provision of instant, private feedback and support in a self-paced

learning environment was of benefit. The students also stated that the Internet delivery

option created a freedom with flexible delivery, permitting the utilisation of community

houses, libraries and other educational facilitators in the community.

It was observed that 96 per cent of students from both cohorts registered that the

feedback provided them with more information about their understanding of the area

being tested, stating that the system appeared to honestly demonstrate their acquired

knowledge at any time during the learning path.

A high 90 per cent of students from both cohorts registered that they knew which areas

they should be revising after completing the test, feeling that the system assisted in

identifying what they needed to learn and revise.

Ninety five per cent of the students from both cohorts considered the system would

influence their path of learning during their studies if it were available and 48 per cent

of the undergraduate and 25 per cent of the postgraduate students stated the feedback

from the system would have a substantial to significant influence on their learning path.

Additionally, 95 per cent of the students from both cohorts recorded that they

considered the feedback was better than the traditional Multiple-choice format, but still

consider the traditional format to be a valuable tool for self-testing.

Some supporting documented benefits stated by the students were as follows.

110

• This system could “increase the students’ level of confidence showing how much

they are right or wrong and giving more specific information on their level of

understanding of particular concepts”.

• This system could “assist in the elimination of guessing”.

• This system caters for the “maybe” option where the student is not too sure of the

correct choice.

5.3.3.8 Student Observations According to Age Groups

Further analysis of the data with respect to the age group produced the following

interesting observations. These age classifications are irrespective of Cohorts 1 and 2.

18-25 Yrs Age Group:

It was observed that all of 18 to 25 year old students rated the program highly and

showed a strong trend towards using the system regularly. While the issue of whether

the system would influence their direction of study is not of great significance, the

students indicated that they tended to favour this system above the traditional MCQ

method.

Additionally these students felt that the system identified what they would need to learn

in a self-paced, quick and easy format, providing proficient revision of the subject area.

Some of the concerns expressed by this age group were:

• The system would benefit from an explanation area in the feedback to fully explain

the reason for the correct answer.

• The elimination of guessing was still not complete.

They did not register concern with issues such as access to computers, partly because

computers are readily available to them. They also requested this style of self-

assessment tests to be part of their daily study routine. They appreciated the quick

response of instantaneous feedback, however they would like explanations with their

feedback with references to resources.

26-30 Yrs Age Group:

There are some interesting trends observed from the data generated specific to this sub

group of students. All of these students appeared to appreciate the system and show an

111

enthusiasm towards using it on a regular basis. There is strong evidence that the

students felt that the system would influence the direction of their study but not to a

great degree.

This group also strongly registered the benefit of being able to pursue the “maybe”

option as part of their learning strategy as well as acknowledging the convenience of

having a self assessment available to them via the Web as part of their home revision

strategy.

However, this group of students expressed some concerns regarding access to

computers, as unlike the younger students computer accessibility is not so readily

available.

The students also voiced some concerns about the requirement of using the slide bar as

a means of registering confidence.

Most of the students in this age group are generally employed while completing their

course. Consequently any system that permits them to evaluate their understanding of a

topic at a time and place that is convenient to them is considered to be of benefit.

30+ Age Group:

It is very difficult to draw any conclusions or trends from these students with such a

small population, but there are some information and trends worth noting. All of these

students rated the system highly and acknowledged that they would use it regularly.

Some students in this group considered it to influence their direction of learning greatly

and preferred it to the traditional MCQ format.

Consistent to the observations of the previous age groups they also declared that they

appreciated the system being readily available via the Web providing instant feedback,

however they too have concerns regarding the access to computers.

These older students often seek employment during their study and voiced that Internet

based self-assessment systems generally permit them to evaluate their understanding of

a topic at a time that is convenient to them. Similar to the younger students they also

requested more explanations to the answers with additional guidance in their direction

of study.

112

5.3.3.9 Analysis of Recorded Scores for Students Observations

The MCQCM prototype had the ability to record the students overall scores for each

question, ranging between –40 and 40.

The resulting graph of the frequency of the scores for the Cohort 1 undergraduate and

Cohort 2 postgraduate students is shown in Figure 5-4 and Figure 5-5 respectively.

Figure 5-4: Frequency of the Undergraduates, Cohort 1, Scores for Each

Question.

Figure 5-5: Frequency of the Postgraduates, Cohort 2, Scores for Each Question.

0 5

10 15 20 25 30 35 40 45

-40-31 -30-21 -20-11 -10-1 0 1-10 11-20 21-30 31-40

% o

f stu

dent

s

Scores

Quest 1

Quest 2

Quest 3

Quest 4

Quest 5

113

Figure 5-4 and Figure 5-5 indicate that the scores tended to be skewed towards the

higher marks, with the greater percentage of students receiving an accumulative grade

for each question in the positive. This result could be interpreted as being very

supportive of the student and a valuable method of building their confidence, while also

identifying their weaknesses. Some of the students stated that “the good thing about the

system was that it permitted an allocation of marks to areas that they were not too sure

about” and consequently boosted the total score accordingly.

It was also observed that the Cohort 1: undergraduate students used the slide bar 41 per

cent of the time. Although this is not a high percentage some students mentioned that

the use of the slide bar was foreign to them and felt that they would use the option more

with extended use. In contrast, the Cohort 2: postgraduate students used the sliding bar

only 12 per cent of the time preferring to set the confidence at 100 per cent. The

tendency for the younger undergraduate students to use the sliding bar more often than

the postgraduates could be attributed to their long-term exposure to technology and

being less inhibited when confronted with the device. However, both cohorts also stated

that they would use the slide bar more after prolonged exposure. In addition it was

observed that many of the students registered concern with using the slide bar as a

means of registering both their choice and their confidence of their choice. This issue

influenced the redesigning of the MCQCM greatly, as discussed in further detail in

Chapter 6 dedicated to the redesign of the MCQCM.

5.3.3.10 Stage 2: Method and Results for Instructor Focused Experiment

It was considered to be extremely advantageous to gather information regarding the

Instructor’s perception and opinion of the system at this early stage of development.

The instructors directly involved with the Cohort 1: undergraduate students were

invited to respond to the system.

5.3.3.11 Outline of Method for Instructors Focused Experiment

A combined total of seven instructors were interviewed for the first and second cohorts

of students. Three of the instructors were strongly associated with the Information

Technology area while the remaining four were from the Electronics, Physics and

114

Mathematics areas. All participants hold formal qualifications in their teaching

discipline areas as well as a formal qualification in Education, with at least three years

teaching experience.

Initially the instructors familiarised themselves with the MCQCM tool. During this

introductory exercise they were encouraged to ask questions for clarification to ensure

that they understood the operating processes involved and the scoring system. After the

demonstration they were shown a series of graphs generated from the recorded data

from the student experiment, as previously displayed in Figures 5-4 and 5-5, the first

displaying the frequency of the overall scores for each question.

The instructors then viewed a series of graphs showing distribution of the individual

scores for each of the questions and the graphical presentation feedback screen the

students received at the test conclusion.

At the completion of the MCQCM demonstration and results display the participating

staff members were asked to complete a questionnaire (Appendix A) that addressed the

following questions to contribute to answering the research questions 1, 2 and 3 as

outlined in Section 2.2.

• Do they consider this to be a useful tool?

• Would they use this tool for the duration of the subject?

• Would they construct the answers differently for this type of MCQ format to

enhance the feedback?

• Would the resulting feedback influence their instructional path?

• Would they use this tool for summative assessment?

• What concerns do they have about using the system for self-assessment and/or

grading a student?

• Could using this type of scoring mechanism offer a more refined set of results,

permitting the instructor to differentiate between the grades of students?

5.3.3.12 Analysis of Collected Data from Instructors

The instructors were required to complete a questionnaire after viewing a

demonstration of the system in operation with a summary of the captured data. The

series of questions enquired about the perceived advantages and disadvantages of using

the tool as both for summative and formative assessment.

115

All of the instructors responded that they considered the MCQCM’s feedback would

influence the direction of their teaching to the benefit of the students as it clearly

identifies the areas of concern.

In addition, they all confirmed that using the MCQCM would influence their question

construction to maximise the benefits, permitting the students to display knowledge,

possibly producing a more granular feedback. Additionally, they recognised the value

of varying the question format to increase the effectiveness of the tool. However, all of

the instructors voiced concern regarding the additional workload required to construct

questions in this format.

The instructors considered MCQCM worth pursuing in general but some had

reservations with using it for summative assessment. Those with reservations consider

this type of system would only be beneficial if the students have a clear understanding

of the scoring mechanism and would require sufficient training to be fully effective and

influential.

The MCQCM was considered to be advantageous in assisting to produce a more

discerning grading, supporting the decisional process in the dilemma of borderline

grades where the scores are positioned on the boundaries of pass to fail or higher levels.

The MCQCM was particularly appreciated for the application in the area of vocational

training (TAFE), where the objective is to award the student with a competent or not

quite competent grade (Training Packages for Competency Based Assessment).

There were severe concerns about the system’s tendency to favour the more self

confident, extroverted and disadvantage the less confident, introverted individual.

The staff acknowledged that they would consider using this tool both on student

demand and at the instructor’s discretion, with a preference to using it as a class

activity.

All of the instructors considered the tool to be of greatest value as a formative self-

assessment as part of the student’s revision. They did however express concerns about a

full implementation suggesting that further trials occur as the tool is refined to ensure

that it does what it purports to do.

116

5.4 Discussion

In general the MCQCM prototype application was well received by both the students

and instructors who participated in the pilot program. The students appreciated the

opportunity to use the self-assessment MCQCM and considered it beneficial in their

preparation for the oncoming assessment task.

The students considered the MCQCM to be easy to operate and the method of scoring

appeared to encourage risk-taking, resulting in the students manipulating the MCQCM

to their advantage. The students increasingly used the sliding bar with more exposure

to register their confidence and confirmed that the feedback and consequential score

was both comprehensible and helpful for further study direction.

The initial introductory session required the participants to use the tool in a non-

threatening environment, which proved to be a successful exercise as they responded to

the MCQCM in a relaxed manner. It was important to the success of further work that

this introductory exercise be used as it assisted in establishing the MCQCM as an

acceptable assessment tool.

This research identified a number of concerns to be addressed during the ongoing

MCQCM development. It was apparent from their verbal protocol during testing that

the students were not comfortable with the interface, due to the lack of familiarity and

most importantly the failing of the prototype to adhere to the HCI design principles of

good navigation, consistency of visual presentation and error management. The

students also expressed concerns about being required to identify not only what they

considered to be the correct options but also what they considered to be incorrect. Some

students demonstrated hesitation registering their level of confidence in all of the

options and tended to only do so for the correct options.

In the positive light, the simple scoring system was well received by both the

instructors and the students. During the interview process all of the students claimed to

understand the method of calculating the score and consequently would react depending

on their level of confidence to maximise their result. Furthermore, all of the students

stated during the interview that they understood the feedback, as it clearly reflected

117

their responses. They also agreed that that the feedback would assist them in deciding

their study path to improve their understanding of the topic.

It was observed, and confirmed during later interviews, that the students tended to

minimise the use of the slide bar for the first few questions. However, as the student

became more relaxed with using the MCQCM, the use of varying confidence levels

selected on the slide bar increased.

All of the students requested more opportunities to use the MCQCM as they considered

it to be greatly beneficial in confirming their knowledge and highlighting their

weaknesses in the topic covered.

A major concern identified by the students was the use of the sliding bar to register

both their confidence and choice of the correct answer in one action (see Figure 5-1).

This operational component of the MCQCM needed to be addressed and is further

discussed in Chapter 6 which focuses on the redesigning of the MCQCM.

The MCQCM demonstrated the attributes of a good formative assessment tool which

should encourage students to evaluate their understanding of a topic before it is too late.

In addition, it must be considered by the students as a non-threatening, non-

discriminatory support to their learning, with the resulting feedback benefiting both the

student and the instructor.

This preliminary part of the research using the MCQCM prototype produced some

encouraging evidence to support the use of the MCQCM and the utilisation of

confidence measurement as a formative assessment strategy. Throughout the pilot study

the students and instructors demonstrated an appreciation of the MCQCM as a means

of revision, especially with the advantage of it being readily available via the Internet.

The results appear to support the use of the MCQCM as an effective formative

assessment tool for the student on a regular basis, permitting them to independently

self-evaluate their state of knowledge at any stage of the educational program. The shift

of control of the learning program to the student was well received by both the students

and the instructors. However, concerns about the possible favouring towards the more

confident, extroverted student were worth noting and further investigation.

118

5.5 Further Development of the MCQCM

The success of this initial exercise encouraged further investigative work in this field to

further develop and refine the MCQCM into a user-friendly, HCI compliant, Web-

based format capable of delivery via the Internet. This new version would then be

available to a broader range of students and instructors within the faculty, encouraging

the integration of the MCQCM as an extension of the learning program over the full

duration of the semester.

At this stage of the research it was decided that the mode of application of the MCQCM

should also be investigated, having some groups using the system at the instructor’s

discretion while others would have it readily available to them on demand. As

anecdotal evidence suggests that traditional MCQs may not be a reliable tool to

measure student’s knowledge, with questions about the validity of using it as a

summative assessment tool, it would be beneficial to investigate the suitability of the

MCQCM for summative assessment, observing whether the method of scoring

produces a more granulated set of results than that of the more traditional assessment

strategies.

The ongoing concern, voiced by some of the teaching profession, of disadvantaging

certain groups in our community due to our choice of assessment could be evaluated by

investigating the application of the MCQCM to accommodate various learning styles

and personality traits, such as the extraverted versus introverted.

The aforementioned concern of using the sliding bar as a means of registering both

confidence and choice was deemed to be a critical flaw in the design of the MCQCM

requiring immediate attention.

5.6 Summary

This chapter has reported on the findings of two independent pilot runs that were

designed to elicit both the student’s and instructor’s perception of using the MCQCM.

This chapter further identified some critical areas of concern as a result of the pilot

trials with the MCQCM. While there was a general satisfaction with the use of the

system, many of the students felt that there were problems with its general usability,

119

particularly the use of the sliding bar as the primary method for registering both

confidence and choice.

The resulting findings have been thoroughly analysed in preparation for the next stage

of this research, the redesigning of the MCQCM. Chapter 6 documents the redesign of

the MCQCM in keeping with Human Computer Interaction design principles and best

practices.

This chapter answered the research question formulated to evaluate the instructor and

student’s perception of assessment with confidence measurement for formative

assessment: Does Assessment with Confidence Measurement produce more meaningful

feedback when used for formative assessment?

In particular it addressed the research sub questions:

Q1A: What are the student’s and instructor’s attitudes and perceptions of assessment

with confidence when used for formative assessment?

Q1C: Does the use of assessment with confidence measurement provide additional

valuable feedback to the instructor when used for formative assessment?

In addition it also contributes to addressing the third research question:

Research Question 3:

What are the design requirements for developing an interactive assessment with

confidence measurement to ensure that instructors and students are able to achieve

maximum benefit from the system?

At this early stage the qualitative data identified usability areas of concern to be

addressed before broader implementation. One of these being the cognitive overload

caused by the single action of sliding the bar for both the choice of an answer as either

being correct or incorrect while also registering the level of confidence. Additionally

other areas of concern identified where the navigational component of the system, the

error prevention, error recovery strategies and the method of displaying graphics with

limited screen space. These concerns and others are addressed in Chapter 6.

120

CHAPTER 6 DESIGNING AND REFINING THE MCQCM FOR DELIVERY VIA THE WEB

In Chapter 5 the initial pilot studies of a rudimentary standalone prototype was

discussed. These pilot studies were designed to ascertain if the assessment with

confidence measurement strategy was worth further investigation. The positive

response from these initial trials were encouraging, leading to further activities

requiring the development and refining of the more sophisticated version of the

MCQCM for implementation across a number of subjects. This chapter addresses the

issue of the confusion in registering confidence with choice by introducing Bandura’s

(1977) work on self-efficacy and its recent applications in the same domain ((Moos &

Azevedo, 2009).

A contributing factor to the design of the MCQCM is game design and this chapter

identifies those components that have an important input into the interactive

educational environment, aligning the attributes of the MCQCM to them. In particular

it considers the design and usability elements that contribute to the game play

experience for educational interactive systems.

This chapter concentrates on the design, development and refinement process to

produce the Web-based version of the MCQCM tool at an acceptable operational level.

The chapter includes the documentation of the evaluation of the revamped MCQCM

against a set of customised usability heuristics (Sim, Read, & Cockton, 2009) designed

to gauge the usability of interactive assessment tools that is critical to the success of

educational interactive system implementation. This activity culminates in the extension

121

of these computer assessment heuristics applicable to the development of interactive

assessment with confidence measurement systems. Finally in this chapter there is a

need to provide a solution to the challenge of displaying large areas of information on

limited workspace (Leung, 1995), as the use of diagrams and programming script often

requiring large display areas.

122

6.1 Games Taxonomy

The strong association between computer games and educational interactive systems is

not a mere coincidence; it is the result of careful design.

The student of today is surrounded by technology (Prensky, 2003) and understandably,

educational interactive tools often leverage off the games phenomenon, borrowing

many of its themes and functionalities from there (Adams & Rollings, 2007; Baird &

Fisher, 2006).

The uptake of multimedia applications in the educational arena was swift and extensive

due to many educators quickly identifying the benefits to be gained by using this

medium. Many innovative approaches and applications are founded on the evolving

interactive games paradigm, which engulfed the world soon after the introduction of

desktop computers. It is appropriate at this time to discuss some of the more relevant

components of fundamental game theory as this research has a reliance on the

interactive games topology, using the gaming betting metaphor, with its contribution to

the intrinsic motivation as mentioned in Hede’s (2002) model (Section 3.11), pivotal to

the success of the assessment with confidence measurement experience.

As discussed in Section 1.4, assessment strategies are required to meet a set of criteria

to be considered to add value to the learning experience. Similarly interactive

assessment tools also need to meet a set of game play criteria to be beneficial to the

student and the instructor. These criteria include the ability to challenge the participant

to achieve a set of predefined goals adhering to a set of rules within an environment

that encourages risk-taking with a perceived sense of fairness. In order for this to be

understood there is a need to investigate the relationship between games and interactive

educational tools and fundamental games topology. The following section considers the

relevance of game theory to education, then identifies the fundamental set of criteria to

determine if an interactive educational game can be considered of sound design and

practice.

123

6.2 Game Theory Relevance to Educational Games

Prensky (2003) states that this new generation of students are exposed to multimedia

imagery thousands of times a day, in fact they are sacturated in digital media, and

argues the observation of Malcolm Gladwell (cited by Prensky 2003), that we can

educate children if we can hold their attention. It is for this reason that many work

related interfaces are now emulating game type interfaces to encourage engagement

with these play preference individuals.

Amory (2007) promote the Game Object Model (GOM) that marries the Educational

Theory with Game Design in order to facilitate the production of advanced learning

environments, supporting the relationship between learning, playing and story (Amory,

2007). Constructivist Educational Theory (Kaufman, 2003) relies on development and

deep understanding that is actively built up by the learner through their learning

experiences. The critical attributes of constructivist education are the ability to explore,

have social discourse and to play (Amory & Seagman, 2003). Quin (2005) and Rieber,

(1996) state that game play is a strategic part of learning and performs important roles

in psychological, social and intellectual development, a voluntary activity that is

intrinsically motivating(Quinn, 2005; Rieber, 1996).

The process of figuring out the rules of a dynamic representation is known as inductive

discovery (Prensky, 2003). Today’s students see computer skills as a second language,

or even stronger, it is their native tongue (Baird & Fisher, 2006; Prensky, 2003). Part of

their vernacular is based on the phenomena of being prone to be active rather than

passive in their educational approach when given the opportunity. The student

interacting with a multimedia application is fearless, as they assume that “software is

supposed to teach you how to use it” (Prensky, 2003). Students often approach the

problem solving as they do games, rapidly and in an exploratory manner to achieve

positive outcomes. An important component of good game-play is the requirement that

during the early cognitive stage, a good instructor will call attention to the cues, giving

diagnostic knowledge of results and shaping the behaviour of the participant by

affirming positive results with appropriate feedback (Bradshaw, 2007).

124

6.2.1 Fundamental Game Theory Criteria

Adams and Rollings (2007) define a game as having distinctive elements as part of

their structure. These elements distinguish a game from a toy or puzzle. “Play” is the

act of self-entertainment usually connected with toys, puzzles and games. It is the

inclusion of rules and goals, which determine the type of play in which we engage. A

game without rules and goals is a casual experience to be completely interpreted by the

participant. Adams and Rollings (2007) define game play in terms of the challenges and

the actions underpinning the experience.

The inclusion of both rules and goals increases the formal structure of the experience

and distinguishes the activity as a game. Adams and Rollings (2007) further define

game play as a combination of two concepts: being

• the challenges that a player must face to arrive at the object of the game

• the actions that the player is permitted to take to address those challenges.

They consider challenges and actions to lie at the heart of games design, as the

challenges and actions are created and combined together to enhance the experience.

For a game to be successfully designed one cannot exist without the other, in that you

cannot set challenges without appropriate action to surmount them, and you cannot

have actions without relevant challenges for them to address.

6.2.2 The Goals and Rules of a Game

As previously stated, a game must have a goal or a number of goals. Goalless play does

not comply with the definition of game play as even the less demanding games have a

goal. Salen and Zimmerman (2003) require a game to have a quantitative outcome, by

which the measure of success can be attributed. No matter what the goal is, it must not

be trivial, as the challenge laid before the participant is reliant on the defined goal

(Salen & Zimmerman, 2003). This is particularly important to the interactive

assessment with confidence, as games of chance are dependent on the learning and

understanding of odds to optimise the scoring benefit. The reliance on odds alone

(tossing of a coin) does not necessarily constitute a game as there needs to be

participation of the player as part of the challenge. The termination point of a game

125

occurs when the goal has been addressed to the best of the ability of the player. In many

cases this is usually when the victory condition has been met, that is when the challenge

is over. It is at this time that the game experience transcends from the pretend

environment into the real world as the results can be of material benefit and meritorious

achievement, in the case of education this can be attaining formal grades for an

assessment task.

The game rules are the instructions, restrictions and definitions that make up the agreed

conditions of play. Some rules are explicit, being clearly stated up front, while others

are implicit, unwritten and taken for granted. The rules establish a contextual

framework by which the game is played out giving permission for various actions and

denial of others.

6.2.3 Game Fairness

There is a general expectation that all games are fair. The interpretation of fairness is

greatly influenced by society, the individual and other contextual factors. The concept

of fairness is external to the game as the players sit within their cultural settings

defining the rules of their existence while interacting with a mutually exclusive

imaginary environment, not necessarily governed by these external rules. Rules can be

categorised as either Mutable (changeable) or Immutable (non changeable). The greater

the proportion of immutable rules the fairer the game must be. Interactive assessment

with confidence has the vast majority of its rules as immutable and is highly reliant on

being perceived as being fair. Symmetric games are those that have the same rules for

all players. This is a rudimentary requirement for a game to be perceived as being fair.

For the majority of educational assessment there is an obligation for the structure to be

symmetric to be perceived as fair, as any non-conformity would instantaneously deem

it as unfair.

6.2.4 Games Risk and Rewards

Risks and rewards have been a source of entertainment having their roots in the age-old

practice of gambling. While assessment with confidence measurement does not openly

126

encourage or condone the activity to a high level it does align itself with this form of

soft entertainment. We often relate to it as risking money to possibly gain money. It is

this risk and reward that underlies most competitive games, including games that pitch

the player against the system. It does not require money to be a part of it, as any game

where the participant risks losing the chance to gain rewards as offered has a gambling

aspect. Risk is directly proportional to uncertainty, and risk increases as the uncertainty

increases. Adams and Rollings (2007) observed that players have varying attitudes

towards risk-taking, as some take the aggressive stand, the inherently risky approach of

overstating their confidence to maximise gain while others prefer the more defensive

approach, understating their confidence to minimise the risk of losing marks. They

further mandate that game design risk must always be accompanied by rewards, the

greater the risk the greater the reward, otherwise there is no incentive to take the risk.

6.2.5 Learning the Game Play

Learning in this context is not necessarily the learning and understanding of the

educational material but the learning of the game and how it should be used to its

greatest benefit. Game players do learn ways of maximising the benefits, understanding

and predicting the sequence of events to rise to the highest levels. Learning how to play

the game to maximise the outcomes relies on two contributing factors, enjoyment and

mastery, and it has been observed that participants like learning when at least one of the

two is met (Adams and Rollings, 2007).

6.2.6 The Influence of Skill, Stress and Absolute Difficulty on Games

To understand the absolute difficulty of a game, one must consider the skill and stress

factors. The intrinsic skill is the level of skill of the participant. Stress is the emotional

state of the participant during the experience often brought upon by external factors,

such as the fear of failure. Some challenges have an intrinsic stress level incorporated,

such as reactionary games, while others require some constraints (e.g. of time) to

achieve a significant level of stress. In addition a consciousness of the consequences of

the outcomes, such as the formal grading of the performance also has an inherent stress

127

level often accentuated by the application of a time constraint. The absolute difficulty is

the combination of the intrinsic skill and the stressfulness experienced during the

activities. When designing a game consideration must be given to the absolute

difficulty, getting the balance between the stress level and the intrinsic skill required. If

one dominates the other adjustment might be required to bring the combination back to

an acceptable level of absolute difficulty.

6.3 MCQCM Adherence to Game Play Topology

The desire to emulate games in education greatly influences the design and

functionality of the CAA applications and has done so for the MCQCM. The MCQCM

relies heavily on the metaphor of placing a bet to optimise the gain, referred to in

Section 4.3, where there is detailed discussion about the adopted scoring method, based

on the probability theory of gaming. This underlying supporting theory is constructed

around the probability and associated wagering. The user interacts with the MCQCM in

a game environment where they are challenged to achieve the best possible score while

risking a loss for incorrect answers. The MCQCM becomes implicitly motivational by

the close proximity to a challenging game.

It is appropriate that the fundamental game structure and hierarchy, as discussed above,

be referenced here to understand the construction and application of the MCQCM in

this domain, recognising its adherence to some of the aforementioned game play

topology.

6.3.1 MCQCM Adherence to Playability Guidelines and Heuristics

Bradshaw (2007, p. 128) produced a series of playability principles, one of particular

importance to this research is “the need for a visual or tactile response to their actions

to be able to compare how well they are doing in relation to their desired outcome

….how their actions have progressed them in the attainment of their goal”. The

MCQCM was designed with this important guideline, as it permits the student to

interact with the system freely when sliding the bar in a game type environment. This

system of direct manipulation instantaneously, numerically displays the possible gain

128

or loss that would result in the direct consequences of their actions. This reinforces

student’s actions towards progressing towards their final goal, to achieve the best score

possible with their state of knowledge, while also acquiring contributing marks for

partial knowledge and limiting the penalties for answers where they have no real

knowledge.

Desurvire, Caplan and Toth’s (2004) set of game play heuristics are of interest to the

MCQCM application influencing the design of the MCQCM for the following reasons.

As part of Desurvire, Caplan and Toth’s (2004) work to evaluate game playing, they

identified the need for the participant to be quickly involved through the use of tutorials

and lower level experiences (Desurvire, Caplan, & Toth, 2004). The MCQCM does

this effectively by offering a series of training activities based on general knowledge

questions, not subject content specific, in which the students use the system in an open

forum as an entertainment activity. The resulting scores are then displayed and students

are encouraged to participate in further non-threatening games based on general

knowledge.

Furthermore, Desurvire, Caplan and Toth (2004) recommends that the participant

should not experience continual penalising for the same failure, giving them the

opportunity to eventually attain a positive outcome. In fact, she purports that the first

experience should be easy and return immediate positive feedback, which is a major

requirement of questions setting for the MCQCM. It is important that the game applies

pressure while not frustrating the player, with a variation of the level of difficulty to

further engage them. She stresses the need for the player to always be able to identify

their score and the system provide a consistent mapped and learnable response. The

controls, in this case the sliding bar, should be intuitive and mapped in a natural,

obvious manner. The MCQCM’s interactive device, the sliding bar, permits the

participant to engage with it at a comfortably proportioned mapping, as recommended.

6.3.2 MCQCM’s Hierarchy of Challenges and Actions

As discussed games require a hierarchy of challenges for the student to progress

through (Adams and Rollings, 2007), ranging from the simple to more extreme. The

MCQCM conforms to this in the instance when the final submission occurs after the

129

student has addressed all of the questions and in turn all of the optional answers for

each question. The lowest level of challenges, the consideration of the options for each

question,

As previously stated actions are not restricted to the challenges, that is all actions are

not a direct result of a challenge. This can be demonstrated by the observation of the

sliding bar of the MCQCM being slid freely and endlessly at the whim of the student

without any ramifications or repercussions until the final test submission. Similarly, the

student can freely navigate forward and back, jumping from question to question if they

please, changing the levels of confidence as many times as they deem necessary. This

action does not necessarily have any bearing on the final result. It is the process of

finishing and submission that the rules dictate is the final action before evaluation.

6.3.3 MCQCM Learnability

The recommended learnability of a game is of a high priority in the design of the

MCQCM. The process of direct manipulation as described above allows for the

learnability of the system by extended use and practice. The resulting change to the

displayed score if correct reinforces the actions of the student as they develop the skills

to interact with the MCQCM.

6.3.4 Fairness of the MCQCM

The notion of fairness must be prominent in the designing of an interactive assessment,

often reflected in the choice of scoring. The balanced scoring adopted by the MCQCM

generally satisfies this requirement with the fundamental principle of proportional

rewards and penalties for the level of knowledge. This is one of the criticisms leveled at

the scoring mechanisms of Paul (1994) and Gardner-Medwin (2006) as their strategic

approach is dominated by the promotion of choosing high levels of confidence if

certain and low if not, simply sticking to a predefined recipe. Balanced scoring of

equally positive and negative grades promotes moving across the scoring zone where

the loss or gain is proportional to the registration of confidence as discussed in Chapter

4 concentrating on the validity of scoring with penalty.

130

During focus groups held at the Computer Assisted Assessment Conference criticism

was leveled at disproportional penalising marking mechanisms that provide non penalty

areas for low confidence, as Gardner-Medwin’s (2006) and Paul’s (1994) scoring

strategies use, as the participants are well trained in using the tool to maximise gain and

minimise loss. Systems as such are accused of effectively training the student in

methods of maximising grades rather than honestly appraising their level of knowledge.

6.3.5 MCQCM Stress Levels and Overall Level of Difficulty

The control of the MCQCM operational stress level was an important part in the

designing and implementation of the MCQCM. Early iterations of the MCQCM

incorporated a time clock in the right hand top corner to increase the pressure on the

student as part of the assessment strategy. It also kept the test time in check to ensure

that the students completed the exercise within the given time. This was abandoned for

two reasons. It was found that the additional stress was unacceptable as the students

found the change in the level of interactivity, being immersed in the lower level

challenges, too much alone without the extra stress caused by the timer. The absolute

difficulty was out of balance and needed to be reset, relieving the stress level by

eliminating the time constraint. Secondly, the primary objective for the use of the

MCQCM is as a formative assessment tool available to the user at a time convenient to

them under their rules of engagement. The imposed time restriction in this case was in

complete contradiction to the primary objective.

While the designers of the MCQCM can minimise the stress levels from external

sources they cannot eliminate them all as the individuals will bring with them their own

operational stress levels.

6.3.6 Summary of MCQCM Adherence to Game Play Topology

The above discussion formalises aspects of the MCQCM in the game theory area as the

fundamental design strategies incorporated are founded on the game topology. The

MCQCM conforms to many of the requirements as it has game elements designed to

entertain whilst promoting the learning and self-assessment.

131

The MCQCM is defined by game theory as it offers both challenges and actions. The

goal is defined by optimising the scoring benefit and recognition of knowledge, partial

knowledge and incorrect knowledge. The rules of the MCQCM are immutable and

symmetric offering a fair game for all players. The MCQCM offers the risk of

displaying lack of knowledge and achieving negative grades, which offers the longer-

term reward of enabling a directed study path to improve knowledge. The ability to

manipulate the tools in the MCQCM enable affordance of the outcome in assigned

marks without penalty, thus encourages learnability before commitment. The balance of

stress and difficulty has been trialed and moderated to ensure the student is focused on

the question and not the timing of the system.

These factors all demonstrate the synergies of the MCQCM to games theory and how

the MCQCM has met with the best practices of games design to encourage students to

engage with the system to demonstrate their level of knowledge, be it complete, partial

or incorrect.

6.4 Addressing Design and Usability Issues of MCQCM

Sharp, Rogers and Preece (2007) define interaction design as the designing of

interactive systems to support communication and interaction of people in their

everyday and working lives. They emphasise the need for systems to be developed

from the user’s viewpoint, stating that many developed systems that work from an

engineering perspective do so at the expense of how the system will be used in a real

world. The MCQCM is no exception to this area as the role it plays in the student’s

world could be critical.

As outlined in previous discussion in Section 2.4.1 addressing the problem solving and

research frameworks, the Web-based MCQCM system was designed and developed

adhering to the HCI guidelines for interactive systems (Sharp, Rogers & Preece, 2007).

A major contributing factor to successful interactive system design is the mindfulness

of the cognitive load that the system imposes upon the user. Shneiderman and Plaisant

(2005) consider understanding the cognitive and perceptual abilities of the users as a

vital foundation, underpinning interactive system design (B. Shneiderman & Plaisant,

132

2005). Consequently any identified components of functionality of an interactive

system that unjustifiably increases the cognitive load should be addressed immediately

to alleviate undue stress or confusion, clarifying the functionality. In this case the

identified major flaw of the MCQCM was the reliance of the confidence-sliding bar as

the only mechanism to identify if an answer was correct or incorrect as well as the

student’s confidence in that answer. This poorly designed functionality of the MCQCM

in some cases produced a cognitive overload situation with the users, resulting in

confusion and inferior achievement.

Improvement in the design and the consequential usability of an interactive system is

dependent on good HCI practices. In most cases it is reliant on the designer going well

beyond the vague notion of “user friendly”, by having a more complete and thorough

understanding of the broader community (B. Shneiderman & Plaisant, 2005). To

achieve this Shneiderman and Plaisant (2005) identify goals for good design:

Standardisation, Consistency and Portability of data. Their reference to

Standardisation, the need for common user interface components across various

platforms, and Portability: the ability to convert data to be shared across the various

display options had a primary influence in the design of the MCQCM. The Consistency

of the action sequence, layout, terms, unit, colors and so on must be considered for the

duration of the design process. It is this area of consistency that extensive work

occurred in the redesigning of the MCQCM, as the non-cluttered layout of the

interactive screens, consistent positioning of the icons and the clarity of the feedback

displays are critical to the usability of the MCQCM.

A sound navigational aspect is at the heart of good usability. Schneiderman et al.

(2005) identifies the need to have knowledge of the overview with the ability to clearly

pursue details as required. The interaction of goal seeking behaviour can be

summarised with the following four elements of navigation:

• Knowing where you are.

• Knowing what you can do.

• Knowing where you are going - or what will happen.

• Knowing where you have been – or what you have done.

133

Awareness of these navigational elements (Dix, Finlay, Abowd, & Beale, 2004) will

directly assist in the designing of interactive systems that leaves the user in no doubt of

the present, previous and proposed positioning. The progress status of a student doing a

computer based assessment exercise is of the utmost importance.

Error prevention, error messages and assistance in handing errors play an important role

in good usability design. Users are reliant on clear direction when faced with error

messages as failure to do so can lead to fatal errors in operation. Error messages often

have a tendency to overwhelm the participant in a harsh, sometimes threatening manner

(Sharp, Rogers & Preece, 2007) that can have an adverse affect on the users experience,

hence they have to be well thought out to minimise the negatives and maximise their

effectiveness. Sharp, Rogers and Preece (2007) further claim that a poorly designed

interface can often leave the user feeling inadequate, insulting them and having them

feel stupid. The permitting of the user to rectify an error is a critical component of a

well-designed interactive system.

Equally of importance is the method by which graphics and scripts are displayed in

interactive systems. The issue of limited screen space places a serious constraint on

visual communication (Leung, 1995), often resulting in an interference with what was

meant to be and what actually is conveyed. This limitation often leads to a requirement

to navigate around the presented information space or the simultaneous viewing of

information in the same workspace. The main concern is locating the desired

information in the workspace without getting lost. To achieve this there is a need to

have a global as well as a local view of the information space for task switching.

Bannon, Cypher, Greenspan, and Monty (1983) suggests that there are a number of

areas to consider in interface design with regards to task switching; one is the reduction

of the user’s cognitive load (Bannon, Cypher, Greenspan, & Monty, 1983). Leung

(1995) implemented an innovative solution to address the issue of visual display

constraints by adopting a bi-focal approach, in which the user can view targeted

information without losing sight of the broader information space.

Interactive systems are often designed by leveraging off existing artifacts of the real

world with which users have previous experience and are familiar to their operation. An

object offering high affordance permits the user to quickly assimilate with the new

134

environment. The MCQCM does so with game-play topology, as the experience is

closely related to the gambling phenomena. Interactive games are heavily reliant on

presentation elements making contributing to the playfulness of the experience.

Accordingly they must be designed to conform to the game play guidelines as

stipulated by Adams and Rollings (2006). Game Theory offers high affordance for the

MCQCM as previously discussed.

6.4.1 Addressing the Cognitive Load of the MCQCM

The first identified MCQCM operational area of concern that needs to be addressed is

the cognitive process of decision-making, in particular questioning if the nomination of

an answer as being either “True” or “False” is the dominant factor in the participant’s

mind, or if the expression of the level of confidence is the dictating action.

The area of concern appears with the following question. Is the choice of the option

being either ‘true’ or ‘false’ dominant in the participant’s mind? Having the confidence

sliding bar as the primary source of identifying if the option is ‘true’ or ‘false’ could be

confusing to the student. Normally the student would prefer firstly to identify if the

option is ‘True’ or ‘False’ before registering his/her level of confidence. As the

confidence-sliding bar is used to perform two specific functions of selecting the answer

and registering the level of confidence associated with it, there may be

misinterpretations in using the sliding bar. For example, in stating that you are 80 per

cent sure the option is ‘False’, is this the same as stating that you are 20 per cent sure

the option is ‘True’? (See Figure 6-1) This question needed to be investigated in order

to develop a tool of maximum benefit to the students.

135

(a) 80% sure the option is ‘false’

(b) 20% sure the option is ‘true’

Figure 6-1: Slide Rule to Register Confidence.

6.4.1.1 The Problem of using Confidence Measurement to identify correct answers

In order to address this problem we must refer to one of the most influential concepts

formulated in modern psychology, being Albert Bandura’s (1983) notion of Self-

efficacy Expectations. Bandura’s (1983) work focuses on the belief in our capabilities

to successfully perform a given task or behaviour, which in turn influences behavioural

choices, performance and persistence. A key component of Bandura’s (1983)

formulated self-efficacy concept is that self-efficacy can be increased through

performance accomplishments. Bandura (1983) considers this to be the major

influences on behaviours and behavioural change, stating that low self-efficacy

expectations within a domain can lead to avoidance, while an increase in self-efficacy

will result in an increase in the frequency of the approach (Bandura, 1983). He also

postulates that intervention can increase the self-efficacy expectations and specifies

four sources by which self-efficacy expectations can be modified. Two of these are of

particular interest in this area of study. The first is that experiences of performing

successfully will be beneficial. The second is the awareness of physiological arousal,

such as anxiety with the behaviour or task, is seen as a co-effect of self-efficacy

expectations, where an increase in self-efficacy should result in a decrease of anxiety.

Importantly to this study the reverse also applies, a decrease in self-efficacy leads to an

increase in anxiety.

136

Betz and Hackett (1981) extensively used the concept of self-efficacy expectations by

applying it to career psychology and counseling. In their study they implemented the

questionnaire format that retains Bandura’s (1983) original notion of the level

(“yes/no”) with the strength (confidence) of self-efficacy (Betz & Hackett, 1981). The

technique they developed required the participant to commit to an answer first. Once

committed the individual is required to clearly state their degree of confidence in that

answer. Fullarton (1993) also used the same method of testing when investigating

gender effects on confidence in mathematics. Her technique was to ask the student to

identify the correct answer and then to register their level of confidence in the choice

(Fullarton, 1993). This is the very crux of the situation under investigation. It was

considered that by adopting the above technique, asking the student to firstly commit to

an answer before stating their level of confidence, eliminates the possible confusion in

the process. As Bandura (1983) identified, the idea of stating a “level” (answer) to be

followed by a “strength” (confidence) of self-efficacy gives a clear, unconfused picture

of the student’s response.

6.4.1.2 The Design Solution to the Problem of Using Confidence Measurement to

Identify Correct Answers

The problem outlined in Section 6.4, which required the student registering both choice

and confidence in the one activity, demanded immediate action.

The resulting modified MCQCM design is still based on the traditional MCQ format.

The initial questions are presented with the stem and the options displayed with a True

or False button at the end of each option only. The student is still required to consider

all of the options as there could be one or more correct answers, which requires the

student to identify not only what they consider to be correct options but also what

options they deem as incorrect. The new design of the MCQCM question screen is

shown in Figure 6-2.

137

Figure 6-2: First Fundamental Version of the Web-based MCQCM.

The student is required to commit to an answer, or as Bandura (1983) refers to a

“level”. In this case it is either True or False.

Figure 6-3: The Appearance of the Confidence Sliding Bar.

138

The sliding bar for registering the degree of confidence only appears for each option

after the student has committed to either True or False (See Figure 6-3). This controlled

environment ensures that the student is led through the testing procedure with the

minimum of confusion, decreasing the cognitive load.

Once the major concern and cause of confusion of having the one action for facilitating

two requirements of choosing the correct answer (Level) and then the registering of

their confidence (Strength) was addressed, it was then necessary to evaluate the

usability of the system.

6.4.2 HCI Evaluation of the MCQCM

HCI uses various methods of usability evaluation, two of these being User Testing;

where users are directly involved in the testing (Sharp, Rogers & Preece, 2007), and

Evaluation by Inspection: usually by experts in the field evaluating the system against a

list of industry standard heuristics (Te'eni, Carey, & Zhang, 2007). Both methods have

their own strengths and weaknesses. It is difficult to involve users in real life

summative assessment situations in the classrooms due to the complexity of the

environment, as the nature of usability testing often requires the participant to actively

communicate during the process via verbal protocol (speaking out loud), placing extra

stress upon them and in direct conflict with the rules of individual assessment. Any

additional stress could affect the concentration of the participant, influencing both their

final result and their perception of the experience. For this reasons it is often a

preference to use the Inspection method for the evaluation of a computer aided

assessment (CAA) system, where no interference with the student during testing

occurs.

6.4.3 Heuristics Testing for Computer Aided Assessment (CAA)

Sim, Read and Holifield (2008) in their work have produced a series of heuristics

specifically designed to assist in the usability evaluation of (CAA) tools (Sim, Read, &

Holifield, 2008). The works of Nielsen (1994a, 1994b) in developing a general set of

heuristics has been heralded as a major contributor to the HCI field and are extensively

139

employed by HCI practitioners (Nielsen, 1994a, 1994b; Nielsen & Molich, 1990). As

outlined by Nielsen (1994a, 1994b) an heuristics evaluation (HE) consists of a number

of experts (3 to 6) evaluating an interface against a list of heuristics, producing a report

with severity ratings given. These severity ratings are designed to initially identify if a

problem might exist and then gauge its potential impact by incorporating severity

ratings. The five severity ratings devised by Nielsen (1994a, 1994b) are dependent on

the frequency with which the problem occurs, the impact of the problem if it occurs and

finally the persistence of the problem are as follows:

0 = I don't agree that this is a usability problem at all

1 = Cosmetic problem only: need not be fixed unless extra time is available on project

2 = Minor usability problem: fixing this should be given low priority

3 = Major usability problem: important to fix, so should be given high priority

4 = Usability catastrophe: imperative to fix this before product can be released

In recent years development of domain specific heuristics is occurring, as demonstrated

by Paddison and Englefield (2004) in the formulation of accessibility heuristics and

Desurvire, Caplan and Toth (2004) heuristics for games playing (Paddison &

Englefield, 2004).

Likewise, earlier work by Sim, Read and Holifield (2006) highlight the concern that

Neilson’s (1994a, 1994b) severity ratings are too generic for CAA applications not

being able to distinguish what constitutes a Major Usability Problem and a Usability

Catastrophe (Sim, Read, & Holifield, 2006). They identified a need for CAA domain

specific severity ratings that deal with unacceptable consequences when the user

interacts with a CAA application. Sim, Read and Holifield (2008) suggest the following

variation of severity ratings suitable for the CAA application evaluation.

0= I don’t think that this is a usability problem

1= Possible effect, could cause some users to perform less well than would have

performed otherwise

2= Minor effect, would probably affect one or more questions in the test for most users

3= Major effect, would probably affect many questions in the test for most users

4= Catastrophe: all work lost

140

Sim, Read and Holifield (2008) consider the role of the user as a major consideration in

the designing of an interactive assessment system, stating that ultimately the students

have the most to lose. They emphasise that the user experience, level of comfort and

feeling of control when engaging with an interactive assessment tool can greatly

influence their performance. They also recognise the need to understand what is of

importance to the stakeholders. Further, they believe the traditional usability goals of

efficiency, effectiveness and satisfaction are not extensive enough as the goals

pertaining to computer based assessment and are different than the casual user of a

generic interactive system.

Sim, Read and Holifield (2008) also identify the legal obligation an educational

institution has to their students to supply assessment regimes that are deemed to be fair

to all and without discrimination or bias. Assessment tools of poor usability design

could place the institution in a vulnerable position if needed to defend an assessment

appeal as a result of a poor test score being attributed to substandard usability of an

interactive assessment tool. Such appeals could be attributed to loss of test time through

ineffective navigation, the inability to deselect an answer after further reflection and

other negative experiences. It is for these reasons that Sim, Read and Cockton (2006)

embarked on a series of experiments culminating in a corpus of usability problems

directly associated with the CAA environment and consequently to develop a set of

heuristics specifically designed to evaluate the usability of a CAA application (Sim,

Horton, & Strong, 2004; Sim et al., 2009; Sim et al., 2006, 2008). Their constructed

heuristics are listed in Table 6-1.

141

Heuristic

Number Description

1 Use clear language and grammar within questions and ensure the score is

clearly displayed

2 Ensure progress through the test is visible and understandable

3 Answering questions should be intuitive

4 Easy reversal of actions

5 Inform users of any unanswered questions before finishing

6 Ensure appropriate interface design characteristics

7 Visual layout - adequate spacing and visibility of questions

8 Ensure appropriate feedback

9 Moving between questions and terminating the exam should be intuitive

10 Minimise time delays

11 Minimise external influences to the user

Table 6-1: List of Sim et al. (2006) heuristics for CAA.

Although the heuristics for application to the CAA environment listed above were not

available as they were in development, the ‘work in progress’ versions of them were

used as a means of expert evaluation during the MCQCM design process.

6.5 MCQCM Heuristic Evaluation Method

An early version of the CAA heuristics as developed by Sim, Read and Cockton (2006)

were evaluated against the MCQCM by two expert evaluators. The evaluation was

undertaken in a usability laboratory to minimise distractions and isolate the variables.

The evaluators registered their concerns with the MCQCM together with severity

ratings that enabled the following process of redesigning to the CAA heuristic

guidelines to occur.

142

6.5.1 MCQCM Redesign Resulting from Usability Heurisitics

The main interactive components of the MCQCM application will be presented here

with supporting discussion addressing some of the identified concerns with the

MCQCM that registered higher levels of severity. Not all of the functionality of the

MCQCM can be displayed here, as the resulting final design of the MCQCM for

implementation is quite extensive. For this reason the full operational extent of the

MCQCM is demonstrated in the appendices, but it is necessary to display some of the

screen displays here highlighting the main features of the MCQCM, in particular the

screens primarily designed to interact with the student.

To facilitate this section of the research a series of screen displays with explanations

about their design in reference to the Sim, Read and Holifield (2008) HCI Heuristics

for CAA applications are presented in the following section.

6.5.2 Grid Layout of Question Screen

Figure 6-4 is a typical example of the question screen that the student is faced with for

the duration of the test. In this case the student demonstrates that even though they are

quite confident that the answer is option 2, TCP/IP, they think that it might also be the

first option HTTP, although in this choice they are not as confident.

Figure 6-4: Grid Layout of MCQCM.

143

It was observed by the evaluators that there was difficulty in defining the working areas

of the screen enabling clear delineation of questions, responses and navigation. This led

to the adoption of a grid formation for the display.

In Figure 6-4 it can be observed that the screen offers a clear balanced layout using

distinctive areas in a grid formation in keeping with Nielsen’s (1994) layout guidelines

and Sim, Read and Holifield (2008) heuristic 7, an interpretation of Nielsen’s (1994)

heuristics of Visual Layout; Adequate spacing and visibility, heuristic 1, Use clear

language and grammar within questions and ensure the score is clearly displayed

heuristic 3, Answering questions should be intuitive and heuristic 11, Minimise external

influences to the user. Furthermore, it can be seen in Figure 6-5 that the main question

page is divided into its functional areas as identified by:

1 Header with question number being attempted and the total number of questions,

2 Stem of the question and the button to register the completion of the test

3 Answer options

4 Bandura’s (1983) “Level” of either True or False through tick boxes

5 Bandura’s (1983) “Strength of commitment” through slide bars

6 Numerical level of confidence to be submitted

7 Navigation feature including the progress bar, list of all questions and the one

highlighted being attempted, acceptance button to register attempt on question

Figure 6-5: Grid Layout of the Functional Areas.

144

6.5.3 Visibility of Student Progress During the MCQCM Test

Sim, Read and Holifield (2008) heuristic 2 states; Ensuring progress through the test is

visible. This issue was noted during the pilot studies as the students asked to know at

any time what question they were presently attempting and importantly the total

number of questions in the test.

Figure 6-6 shows the supportive elements to address these navigational concerns are

demonstrated.

Figure 6-6: Question Display Showing 3 Navigational Supports.

It can be seen in Figure 6-6 that the redesigned MCQCM achieves good visibility of the

student’s test progress status and navigation by supplying a visual display of their

progress with a progress bar that increases in size as the student completes questions, as

well as a clear statement at the top of the display showing the question number they are

attempting and the total number of questions in the test. In addition it can be seen that

above the progress bar there is the list of the total number of questions with the

completed processed questions being shown in red while all other, as yet not attempted

questions, are in blue. This hyperlinked navigational component also permits the

students to move back and forward to the question as they please. All previous versions

of the MCQCM restricted the student to a linear approach, where they could only

145

attempt the question once and move to the next question in the row. The additional

feature of being able to move to any question at the discretion of the student, for re-

answering or reassessing, is in keeping with Sim, Read and Holifield. (2008) heuristic

4; Easy reversal of actions, heuristic 9; Moving between questions is intuitive and

heuristic 10, Minimise time delays.

6.5.4 Minimisation of Errors and Error Prevention

A critical component of a well designed Web-based interactive system is the need to

support the user by minimising the number of errors (error prevention), as stated by

Nielsen (1994) and later reiterated by Sim, Read and Holifield (2008) heuristic 5:

Informing students of unanswered questions; heuristic 4: Easy reversal of actions and

heuristic 10: Minimise time delays.

The pilot tests and the heuristic evaluation of the MCQCM highlighted the need for the

students to be able to move freely from question to question, and most importantly have

the option of changing their response before test submission. In addition, during the

pilot studies the students requested the flexibility to submit each individual question

response formally using the “accept” button, however, holding the right to change that

submitted answer before the final test submission and consequential assessment. These

functionalities are part of the operation of the MCQCM and are demonstrated in Figure

6-7.

(A): (B)

Figure 6-7: Support for User to Minimise Errors.

146

Area A in Figure 6-7, demonstrates the ability for the student to “accept” a question as

work in progress, while still having permission to revisit if before the final submission.

Hence, at any time a student can have a number of questions completed ready for final

submission or held as work in progress. However, this built in flexibility can often

leave the student confused about which questions they have answered and which have

been overlooked. To address this an error prevention dialogue box has been used (area

B Figure 6-7), where the student is informed of any questions that they have not

attempted before final test submission. At this time they can either return to the quiz

environment to complete any missed questions or proceed to the grading and solutions

display.

When all questions are completed to the student’s satisfaction they confirm their test

completion by pressing the “OK” button as demonstrated in Figure 6-8.

Figure 6-8: Final Dialogue Box to Support the User in Error

Prevention.

6.5.5 Clear and Informative Feedback

The expert evaluation alerted a concern with Sim, Read and Holifield (2008) heuristic

8; Ensure appropriate feedback, as being of utmost importance and in keeping with the

games design element, heuristic 6, Ensure appropriate interface design characteristics.

147

The consequential solution, shown in Figure 6-9, answers the concern where the

student is informed of the following;

• The correct answer for each option

• Their answer with their registered confidence for each option

• The consequential score calculated for each option in the question

• Their overall score for that question

(A)

(B)

Figure 6-9: Feedback Screens: (A) Display for all Questions with Hyperlink to (B)

Display of Individual Questions.

Figure 6-9 (A) is the first screen that the student sees which summarises the results for

the complete test, again using colours that offer high affordance, green to signify

questions where the student has demonstrated good knowledge (score between 20 to

40), blue for questions that need some attention (score between 0 and 19) and red for

questions where the student has shown inappropriate levels of confidence for incorrect

answers (Score from -1 to -40). To assist the learner they can hyperlink to the display

for each individual question as shown in Figure 6-9 (B).

The feedback screen are shown in Figure 6-9 (B) uses familiar icons (ticks and stars)

and again offers high affordance to increase the clarity of the status of the students

knowledge, red for incorrect, green for correct and blue as the overall grade expressing

a comfortable, but not excellent level of achievement. Further examples of the feedback

screen in Appendix C.

148

6.5.6 Summary of the Redesigning of the MCQCM Adhering to HCI Guidelines

The refinement of MCQCM tool adhering closely to the HCI and CAA design

guidelines definitely improved its usability, functionality and effectiveness, by

eliminating any confusing elements.

Controlling the method of the participant’s responses by forcing them to first register

their choice of true or false and only then permitting them to declare their degree of

confidence has ensured that the operational thought process is of a minimal cognitive

load. This permits the student to concentrate on the tasks at hand, and not be

preoccupied with the indecisions and hesitations that could prohibit their interaction

with the system.

As can be seen from the discussion contained here the final operational version of the

Web-based MCQCM has been refined and constructed adhering to the fundamental

design guidelines outlined above, incorporating consistent clear screen layout, sound

navigation, error prevention and error handling to minmise diversions and optimise the

outcomes.

6.5.7 Heuristics for MCQ with Confidence Measurement

The previous discussion demonstrated the adherence of the MCQCM to Sim, Read and

Holifield (2008) rework of Nielsen’s (1994) heuristics for interactive systems. Further

to the guidelines for CAA as derived from Nielsen’s (1994) HCI heuristics for

interactive systems it is proposed here that an interactive MCQ with confidence

measurement system requires refinement of Sim, Read and Holifield (2008) heuristics

given the need to ensure learnability. It is also recognised here that there is a need for

immediacy in response to the user’s activity, visual impediment to screen real estate

and motivation through entertainment to encourage interaction.

Table 6-2 demonstrates Sim, Read and Holifield (2008) customized heuristics for

computer aided assessment systems with a set of guidelines for MCQ assessment with

confidence measurement.

149

Sim et al. Heuristics for

Computer Aided

Assessment.

Guidelines for MCQ’s with

Confidence Measurement

Problems to Overcome

1: Use clear language and grammar within questions and ensure the score is clearly displayed.

Use clear language and grammar within questions and immediately display the registered level of confidence and consequential score.

Learnability

Easy reversal of actions

2: Ensure progress through the test is visible and understandable.

Ensure progress through the test by providing progress bars with total number of questions answered and yet to be answered.

Navigation

Time allocation where required

3: Answering questions should be intuitive.

Answering questions should be intuitive, with possible identification of the number of possible correct answers supplied.

Multiple responses allowed, this is not always the case with other methods of MCQ and must be made clear.

4: Easy reversal of actions. Easy reversal of actions by permitting the student to return to any question for re-answering before final submission.

Learnability


5: Inform users of any unanswered questions before finishing.

Inform users of unanswered questions before finishing by providing alert messages identifying those not answered.

Error Prevention

6: Ensure appropriate interface design characteristics.

Ensure appropriate interface design characteristics using suitable game playing metaphors with appropriate challenges and fairness

Satisfaction

Motivation to “play” with the interface

…………..table continued overpage

150

………..table continued

Sim et al. Heuristics for

Computer Aided

Assessment.

Guidelines for MCQ’s with

Confidence Measurement

Problems to Overcome

7: Visual layout - adequate spacing and visibility of questions

Visual layout - adequate spacing and visibility of questions using bifocal display techniques for display of information in restricted space.

Restricted Screen real estate available to large or graphical questions

8: Ensure appropriate feedback

Feedback to be graphically pleasing, clearly identifying incorrect choices with registered levels of confidence.

Learnability

Affordance


9: Moving between questions and terminating the exam should be intuitive

Ability to move to questions in a non-linear manner and clear action for final submission.


Navigation

Error prevention

10: Minimise time delays

Immediate process of score calculation provided by Web-based solution.


Affordance, able to see the immediate consequence of actions

11:Minimise external influences to the user

Develop presentation screens that use visual or audio stimuli only if critical to the question.

Reduce cognitive load

Table 6-2: List of Sim et al. (2006) Heuristics with Elaborated Heuristics for MCQ

with Confidence Measurement and Problems Addressed by Revised Heuristics.

6.6 MCQCM’s Method of Handling Graphical Components

Often the content area being tested is reliant on the interpretation of graphics and

scripts, particularly in the IT discipline area where this research was undertaken. It is

151

usual testing practice for the student to be shown a diagram; such as an entity

relationship model demonstrating a particular scenario, or a series of programming

segments; such as Structured Query Language (SQL) scripts. It is from these diagrams

or scripts that a set of questions are asked where the student is required to identify

various components of the diagram, express the relationship between the entities, or

recognise an error or identify the correct script when given the output. The issue with

many of the computer based MCQ testing tools is the way it handles these graphics and

script requirements where screen real estate is of a premium. In the traditional MCQ

Web-based assessment package format it is usual practice for the graphics, or script, to

be revealed at the end of the question for reference. This often produces a question

greater in length than the screen, causing a number of usability issues.

6.6.1 Previous Investigative Work on the Graphics Component of Interactive

Assessment

Previous work (Farrell & Leung, 2004a) investigated a practice involving the use of the

Blackboard computer aided assessment package for a large group of students, which

used the “more than one page” display method for the questions containing graphics

and SQL scripts. The analysis of results of the exercise was influential in the designing

of the MCQCM graphics capability component. A brief explanation of the comparison

of the preference of graphical user interface using Blackboard or paper-based questions

with graphics is contained here followed by a summary of the findings. The results of

this evaluation assisted in further refinement of the MCQCMs handling of graphics

where the available real estate is limited, as they support the MCQCM’s handling of

graphics.

A total of 465 students, consisting of 404 Introductory Database (DB1) and 61 Data

Communication (DC), were surveyed as part of the subject review process to give

comments about their assessment experiences. Both cohorts of students used the

Blackboard MCQ assessment package as a summative assessment activity contributing

to their final grade. The Introductory DB1 subject test relied heavily on SQL script

oriented questions whereas the DC subject did not, having all of the questions in the

traditional MCQ format.

152

The students were asked to give their opinion in comparing Blackboard for MCQs and

the traditional paper-based equivalent.

The vast majority of the DC students, 64 per cent, preferred the use of the Blackboard

MCQ to a paper-based equivalent as they felt that they were able to complete the

exercises without real concerns. This was in complete contrast to the results of the DB1

students, where 74 per cent of them voiced concerns about using an online test to

compare SQL scripts. The application of a non-parametric statistical analysis was

applied to the data using the Chi-squared test for significant difference between the two

groups. The question related to their satisfaction with using Blackboard versus paper-

based for MCQs. This observation proved to be significantly different for the two

cohorts (χ2(7)≈ 41.465 : P< .001).

On further investigation it was observed the main reason for the discontentment was the

requirement of the DB1 test to scroll up and down the screen to observe and compare

the scripts and diagrams when answering the questions. The inability to view both the

questions and graphics on the same screen appeared to interfere severely with the

student’s concentration. Correspondingly, many of the students complained about their

grades claiming that the testing mode was not appropriate for the type of questions

being asked. Farrell and Leung (2004a) concurred with this producing the following

evidence supporting the student’s claims.

Many of the DB1 students complained about eyestrain and high anxiety. They also felt

that their ability to concentrate on the questions was compromised by the continual

scrolling, having a negative effect on the final outcome. In addition, the scrolling made

it extremely difficult to review answered questions before submission, again being

detrimental to the final grade.

Interestingly, it was observed that 90 per cent of both cohorts of students registered that

they appreciate the speed and automation of an online test, even though many of the

introductory DB1 students were unsatisfied with this particular test. They

acknowledged the value of the automated online tests and its contribution to the

learning experience.

Students also felt that an index to the questions should be provided so that access to any

particular question is easily attainable, this is in agreement with previous sections

153

where the MCQCM design is influenced by Sim, Read and Holifield (2008) heuristics

on navigation.

The DB1 students demonstrated a significant difference to the DC students in

preference to paper-based questions for the following issues:

• Need for better maneuverability between questions.

• Need to check all answers before submission.

• Ability to concentrate on a single question at a time.

The issue of the requirement to scroll when comparing SQL scripts in DB1 was by far

the most concerning. This observation is of particular concern to this research as

systems that do not cater for the graphics in an appropriate way have a detrimental

effect on the final outcome and are limited in their application. This highlights the need

for a match between the content being tested and the CAA chosen. In this case it was

apparent that the DC content fitted well into the constraints of the CAA whilst the DB1

test did not.

It was concluded that while CAAs offer great opportunities, it is important that the

content being tested should be well matched with the CAA of choice. This is evident

for the DB1 script comparison exercise, where the students needed to view the

alternative scripts and would have benefited from being able to highlight components

for closer scrutiny. In addition, Farrell and Leung (2002a, 2002b) identified the need

for early exposure to the CAA, perhaps as a formative assessment task, as the potential

for CAA assessments can only be maximised with good planning and implementation

(Farrell & Leung, 2002a, 2002b).

6.6.2 MCQCM’s Graphics Solution

In light of the discussion above it is proposed that inappropriate choice of CAA for

graphic reliant tests can create great concern. With this in mind there was a need for the

MCQCM to manage the graphics and script components in a way that does not interfere

with the progress of the students, to ensure the use of the MCQCM is not limited by its

use of screen real estate.

154

6.6.2.1 The Reliance on the Visual Communication Channel

Leung (1995, p. 158) in his work on the application of bifocal displays considers the

visual channel in computer interactivity a “far more effective means of communication,

as the high bandwidth nature of this channel facilitates speedy information retrieval and

comprehension”. He further acknowledge that visual communication is the main output

channel used, as effective human-computer interaction is reliant on the presentation of

information enabling the eye and brain to work together to comprehend what the

presenter wants them to see. Leung (1995) considers the early development of infant

hand-eye coordination in their play and interaction with their environment has prepared

them well to engage in increasingly complex activities, with the designers of interactive

computer systems exploiting such skills, particularly in games. Shneiderman (1982)

introduced the concept of “Direct Manipulation” of objects and actions of interest in the

visible interface (Hartmann, Abdulla, Mittal, & Klemmer, 2007; B. Shneiderman,

1982) providing rapid reversible incremental actions, replacing the need for complex

command language syntax. McCormick (1988) state that an estimated 50 per cent of

the brain’s neurons are involved with vision, hence the visualisation in computer

interaction is putting that neurological mechanism to work, consequently over-loading

cognition, thereby reducing the capacity for mental processing of other more pertinent

issues such as the question at hand (McCormick, 1988). Marcus (1984) identifies three

“faces” of the computer, the Outerface; final commutated display, Interface; the frames

of command and control for the user to interact with the system, and Innerface; the

frames of command and control for the computer experts to interact. He argues that

computer graphics should be used appropriately in all of these faces (Marcus, 1984).

Leung (1995) expresses the concerns faced when humans interacting with large

amounts of data on a small screen often need to switch tasks to achieve a higher level of

goals and are often limited by the screen’s size, additionally, when the user interacts

with a large information space there are often difficulties locating and comprehending

the data. Leung (1995, p. 125) states that “visual techniques have an important role to

play to overcome the presentation and navigation problems associated with the human

interaction of large information spaces”.

155

6.6.2.2 Bifocal Display Methods for Large Information Spaces

Spence and Apperley (1982) first proposed the bifocal display with Leung (1995)

further refining and implementing it as an effective means of presenting large amounts

of information on the standard screen, as a response to the need for a method of

handling accessible information (Spence & Apperley, 1982). Their bifocal display

technique is the concurrent presentation of localised detail while still preserving the

global context. In application, it permits the entire space to be seen with a portion

shown in full detail, although the surrounding non-detailed areas are “demagnified”.

This is contrast to the non-distortional presentation techniques (Leung, 1995) that relies

on scrolling and paging and the split screen approach. Paging, scrolling and the split

screen are three non-distortion techniques commonly used. Scrolling permits the

detailed viewing of sections of the graphical display while hiding the rest from view.

Paging displays a section in detail in a new window or area again hiding the remaining,

surrounding graphics. Both of these techniques are identified in the work of Farrell and

Leung (2005) to be unacceptable when used in isolation in a CAA application (Farrell

& Leung, 2005).

6.6.2.3 The MCQCM Visual Display Technique

The MCQCM has combined the technique of bifocal display with the split screen, as

well as incorporating paging options.

In light of the investigative work outlined above in Section 6.6.2, handling of the

graphics component of the MCQCM needed special consideration. As a result the

display technique of Spence and Apperley (1982) and the later bifocal method of Leung

(1995) was adopted, with some additional modifications and variation.

When constructing an assessment system with extended application in the Information

Technology field it is necessary to cater for script comparison, model interpretation and

other various questions reliant on graphical presentation. Hence, it was decided that the

MCQCM would incorporate a graphical presentation method that will minimise the

issues of single screen presentation.

156

To achieve this, the MCQCM presents its graphics in a dynamic, unique way. It is

difficult to demonstrate this here, as a static presentation however a series of screen

shots with appropriate explanations will be provided. The MCQCM presents each

question fully on one screen irrespective of the content. For a question with script or

graphics it divides the screen into two with the top half with a more compact version of

the question and the lower second half with the script or graphic for the student’s

consideration as can be seen in Figure 6-10.

Figure 6-10: MCQCM Dual Screen Display.

As can be seen for this example it was required that the student views a diagram

directly related to the question.

The configuration shown in Figure 6-10 permits the student to view the diagram while

still being able to view the question. The text of the question is small but in most cases

legible. The student then has two options to view the diagram in more detail.

The first is by clicking on the “Maximise” button on the top left corner of the graphics

area. As a result of this action the graphics area expands to fill the screen, as shown in

Figure 6-11.

It should be noted that the graphics, in this case a database Network Diagram, has the

question repeated underneath it in text, consequentially the student can view both the

question and the diagram together even though the original question for answering is

not on the screen, as shown in Figure 6-10. Once finished viewing the student can

157

return to the shared MCQCM question and diagram split screen display by clicking on

“Minimise” in the top left hand corner of the screen.

Figure 6-11: MCQCM Diagram as a Full Screen.

Alternatively, the student can choose to increase the viewing area for the diagram by

placing the cursor on the line dividing the two displays and drag it upwards towards the

question area. This decreases the question display area and increases the diagram

display area. The response is immediate as the cursor moves up and down. Hence the

action permits the student to move quickly from question view to diagram view without

any interruption. A series of shots in Figure 6-12 demonstrate this technique from

sliding the bar from a mid way position to a larger graphic display.

Figure 6-12 depicts two screen shots demonstrating the instantaneous sliding movement

of the MCQCM permitting the student to view various size images of either the

question or the diagram immediately. The diagram on the left is a result of sliding the

dividing bar upwards towards the question.

158

Figure 6-12: Demonstration of MCQCM Display of Varied Screen Sizes.

The student would systematically answer the question by either alternating between the

graphics and answer screen or by sequentially sliding the bar up for graphics viewing

and down for registering their answer. This simple, but effective, approach received

high praise from the students during the implementation, as it seemed to eradicate the

issues of scrolling and removing the question from vision, as presented in the previous

study. Students appreciated that it permitted quick navigation at ease without any

interruption during the test. It also added to the effect of placing the control of the

system into the student’s hands, a necessary feature discussed in previous chapters.

6.7 Summary

The evolutionary design of the MCQCM has been presented in this chapter, taking the

MCQCM from a rudimentary prototype to a fully functional Web-based solution for

implementation in the classroom. The heuristics of Sim, Read and Holifield (2008)

have been refined and extended to suit the requirements of MCQ with confidence

assessment interactive design. In doing so it has been refined in accordance with the

HCI guidelines as outlined adhering to the customised computer aided assessment

heuristics of Sim, Read and Holifield (2008). The challenge of dealing with a complex

diagram and scripting has been also addressed by incorporating aspects of Leung’s

(1995) bifocal display options for large workspaces. The chapter extensively discussed

159

game play topology and its influence on the MCQCM and this research and leverages

off Bandura’s (1983) theory of self-efficacy in order to decrease the cognitive load.

The research question addressed in this chapter is

Research Question 3:



maximum benefit from the system?

This is achieved by the application of sound usability heuristic evaluation techniques as

developed by Sim, Read and Holifield (2008) reworking Nielsen’s (1994) heuristics for

interactive systems.

Chapter 7 reports on the field studies where the MCQCM was used for formative

assessment exercises as part of the delivery program, supporting the students in their

self-assessment and reflection.

160

CHAPTER 7 COMPARISON OF THE MCQCM TO A TRADITIONAL CAA PACKAGE FOR FORMATIVE ASSESSMENT

Chapter 5 discussed the results of the two pilot studies developed and initiated to

evaluate the functionality and usability of the MCQCM gauging the student perception

to using it in a formative assessment task and the design issues to be addressed.

Chapter 6 then applied the recommended changes to the design of the MCQCM.

Comparison of the MCQCM as a formative assessment tool to a traditional MCQ

format tool was required at this stage of the research. This chapter initially reports on

a small simulation exercise to ascertain if the redesigned MCQCM broadly represents

the level of knowledge of the individual before extending it to a large cohort of

students. It then reports on the findings of an investigative study in which a

comparative analysis is undertaken from the responses of a cohort of students using

both the Blackboard Multiple-choice Computer Aided Assessment (CAA) package and

the redesigned MCQCM as a tool for revision.

161

7.1 Trial

During post pilot program discussions in Chapter 5 with students and instructors the

question arose about the ability of the system to truly represent the state of knowledge

of the individual who partakes in the exercise. In particular the instructors expressed a

concern that unleashing the redesigned MCQCM on a large group of students as part of

their learning experience might be a bit presumptuous, as it was not field tested,

suggesting that small trials occur for the duration of the MCQCM development.

It was thought that the best method to establish if the MCQCM results were

representative of the students level of knowledge would be to run a simulation, where a

small number of students with already recognised levels of achievement were asked to

use the MCQCM as a formative assessment tool. To accommodate the simulation

exercise, 6 students of various levels of achievement were invited to participate at the

end of the semester, before the exam. Their abilities to date varied across the spectrum.

The students were given access to the system for a period of one week and encouraged

to complete any number of the given tests as many times as they wanted. All of the

results were recorded automatically and analysed at the end of the exercise.

It was pleasing to observe that in most cases the MCQCM results were consistent with

those achieved by individual using other traditional assessment. The question of

whether the spread of the MCQCM grades would be equivalent to that of the final

grades previously achieved seemed to be supported. The high distinction students all

achieved high MCQCM scores (90%+) as the middle range distinction and credit

students secured the equivalent for their results (74% to 63%). (Appendix B)

There was one set of results that required further investigation, as a high achieving

student’s MCQCM results were extremely poor. (Appendix B) This outcome was

completely unexpected and was received with concern as it reflected poorly on the

MCQCM, immediately prompting a series of questions. Was the student confused

using the system? What happened for the student to do so badly? Does the scoring

system not truly reflect the level of knowledge? On further investigation it was revealed

162

that the student’s first attempt delivered the results that were expected of him: it was the

later attempts that were inconsistent with the expected knowledge.

After initial discussion it was decided that the best way to ascertain why this result was

recorded was to contact the student to see what occurred to produce a result in direct

contrast to the student’s proven ability. When approached the student explained the

reason for the discrepancy was that he enjoyed the interaction of the system and

deliberately played with the operation to see what the results would be. He emulated

different levels of knowledge to see how the system would react, enjoying the

opportunity to interact with it and “push it to the boundaries”.

Even though the discrepancy initially rang alarm bells, it ended up being a positive

result, as it reinforced the idea that the MCQCM, when used as a non-threatening,

formative assessment tool, had encouraged inquisitive, exploratory behaviour, engaging

and entertaining them for a period of time.

7.2 Comparison of the MCQCM to a Traditional Computer Based

Formative Assessment Package

On the successful completion of the simulation the MCQCM was deemed appropriate

to be used as a formative assessment tool for a larger group of students. The following

activity was initiated, as outlined below.

7.2.1 Method

A cohort of 74 students was offered both the MCQCM and Blackboard MCQ systems

as part of their revision program during the semester. The two subjects that this report

focuses on are Database 1 (DB1) and Advanced Web Technologies (AWT). There were

41 DB1 and 33 AWT students.

The Blackboard test was the simple Web-based Multiple-choice Question (MCQ)

format of a stem followed by four simple text options. It does not use penalties for

incorrect answers. In contrast the MCQCM used the confidence measurement and

penalties. Both cohorts of students were offered these self-assessment tests online,

163

permitting him to complete them at their convenience either in the labs, at home or any

other location of their choice where they had Internet access.

As part of the subject review at the end of the semester the students were asked to

complete a questionnaire on various aspects of the subject as part of the standard

subject review process. Included was a series of questions that focused specifically on

the student’s perception of Blackboard CAA and the MCQCM revision tests that they

completed. The data were collected and analysed. The analysis produced some

encouraging observations.

7.2.2 Results Analysis for Students

The MCQCM and MCQ results for the formative assessment exercise were recorded

for analysis to ascertain if there was general consistency between the scores. Figure 7-1.

demonstrates the AWT student’s scores clustered by the MCQ scores and Figure 7-2

shows the DB1 scores clustered by the MCQ scores. The MCQ scores are plotted in

ascending order with the student’s respective MCQCM scores.

Figure 7-1: Graph of MCQ and MCQCM Scores for Cohort 1.

Figure 7-2: Graph of MCQ and MCQCM Scores for Cohort 2.

‐50.0

0.0

50.0

100.0

150.0

%

Comparative Student Scores in MCQ Ascending Order for AWT

MCQCM

MCQ

‐50.0

0.0

50.0

100.0

150.0

%

Comparative Student Scores in MCQ Ascending Order for DB1

MCQCM

MCQ

164

The graphs in Figure 7-1 and 7-2 demonstrate that for both groups a student who

achieves a good score for the MCQ achieves a similar score for the MCQCM.

Likewise, generally a student who does not do well with the MCQ score also does not

score well with the MCQCM. It can be seen that when comparing the individual scores

for the MCQ and MCQCM there is close to an even distribution in proportion of higher

to lower MCQCM scores compared to the MCQ offering a general consistency of

scores.

The subject evaluation survey contained 8 questions, 5 specifically designed to gauge

the usefulness and effectiveness of the tests, and 3 to compare the two testing methods.

In addition the participants were also asked to comment on both the positive and

negative aspects of the tool. The age demographics are presented in Table 7-1 showing

the proportion of undergraduates and postgraduates who were older than 25 years of

age.

Demographics. Postgrads PG’s >25 yrs Undergrads UG’s >25 yrs

Students 16% 75% 84% 52%

Table 7-1: Proportion of Postgraduate and Undergraduate Students and

Proportion of Each >25 Years of Age.

It was observed that there were no apparent differences between the responses of the

two cohorts, as well as no detectable difference when comparing the response from the

postgraduates and the undergraduates. This was the same for the two age groups. The

preferences were consistent across all cohorts and subgroups.

Analysis of all student responses.

The first five questions refer specifically to the student’s perception of the MCQCM

tool. The remaining three questions are specifically designed to compare the student’s

perception of Blackboard CAA to the MCQCM.

To assist the reader the responses have been grouped together in Table 7-2 beside each

of the survey questions.

165

Question

No

Value

Some

Value

Extremely

Valuable

Q: 1 How would you rate the MCQCM

testing method as part of your learning

process? 9% 75% 16%

Never Sometimes Regularly Q: 2 How often would you use MCQCM if

available at any time? 5% 50% 45%

None Some Substantially Q: 3 To what level would the MCQCM

influence your direction and path of your

learning? 13% 66% 21%

Unclear Clear Extremely

Clear

Q: 4 When viewing the MCQCM results

display how clear were the scores?

13% 60% 27%

Unclear Clear Extremely

Clear

Q: 5 When looking at the MCQCM display

how clearly could you identify the problem

areas? 19% 52% 29%

Table 7-2: Responses to the Questions of Student’s Perception of the MCQCM.

It can be observed that a significant number of the students considered the MCQCM as

a good self-assessment tool, with 95 per cent acknowledging that they would use it if

available and 87 per cent declaring that it would influence their learning path.

Importantly, 87 per cent consider the feedback display clear to extremely clear and 81

per cent felt that it identified the areas of concern. In addition, some of the students

asked that it be made available on a weekly basis linked in with the lectures. Students

commented on the ability of the system to display complex diagrams beyond the scope

of many traditional MCQs.

Of those students who declared that they would use the MCQCM tool during their

studies, some students made further requested for it to be available for other subjects in

their studies. The more interesting observations occurred when we compare the two

offered methods of self-assessment made available. The responses to the questions

pertaining to the comparison of both systems are summarised in Table 7-3.

166

Questions

Same Better Much Better Q1 How would you rate the MCQCM

feedback to the BB feedback? 17% 63% 20%

BB Neither MCQCM Q2 Which of the two, BB or MCQCM, was

the best in directing you with your revision? 25% 25% 50%

BB Neither MCQCM Q3 Which of the two, BB or MCQCM,

better informed you of your understanding

of the topics? 33% 33% 33%

Table 7-3: Responses of Student’s Perception of the MCQCM vs BB.

It is observed that the greater majority of students (83%) appreciated the MCQCM

feedback over the Blackboard MCQ. The preference to Blackboard, the MCQCM or

neither was equally distributed at 33 per cent for each when asked which one informed

them better of their understanding of the topics. More students (50%) registered greater

preference for the MCQCM over each of Blackboard (25%) and neither (25%) with

regard to the influence of direction of their revision. It was encouraging to observe that

the MCQCM rated well against the long-standing, established standard MCQ format. It

should be noted that the students have had previous exposure to the Blackboard CAA,

which could add to their comfort level when using the new MCQCM. Alternatively, no

familiarity with the MCQCM may increase the novelty factor.

7.2.3 Instructor’s Focus Group for Formative Assessment

To understand the viewpoint of the instructors in relation to the use of the MCQCM a

focus group was formed where the three instructors for the two student cohorts met

with the developers. As in previous studies the instructors were shown the MCQCM

displays of the student’s grades (cumulative and individual) along with displays of the

question screens and the student feedback. The instructors were asked to give their

feedback and opinion for each tool as well as encouraged to contribute additional

opinions and perceptions of the MCQCM.

The following observations and recommendations were recorded.

167

The instructors all considered the Blackboard and MCQCM formative assessment

results to be closely aligned, commenting on the more fine-grained set of grades for the

MCQCM. The instructors registered concern that the increased distribution of results

might not necessarily demonstrate a more discerning set of grades but in fact only

really represent the student’s propensity towards stating high or low confidence. They

elaborated on this by stating that they observe some of their students to often overstate

their levels of confidence while equally some understate. The instructors extended this

point by highlighting that gender and cultural backgrounds may influence the student’s

willingness to register high levels of confidence.

When shown the MCQCM graphs of the student population scores that identified the

areas of overall poor performance the instructors immediately acknowledged the areas

of apparent misunderstanding and confirmed that they would be readdressing those

areas during the revision session, furthermore they stated they would adjust the

curriculum for the benefit of the next enrolled cohort of students.

With regards to the testing interface the instructors perceived the analogy to betting on

a answer for proportional rewards or losses would be engaging and challenging to the

students. They appreciated the visual metaphor of a game with the direct manipulation

interactivity, identifying the bifocal/split screen interaction for the graphics in particular

as an entertaining element. However, they did register concerns that some of the

students could find the betting metaphor inappropriate based on their religious or

cultural beliefs and that an alternative might be necessary. They all thought that the

ability to navigate through the questions in a non-linear manner was advantageous to

the student, as well as the progress bar highlighting the questions yet to be answered.

When the instructors were shown the MCQCM student feedback presentation screens

they were pleased with the use of the familiar games icons, such as the green stars for

correct answers and the red ones for the incorrect answers. They also appreciated the

initial feedback presentation screen where the overall results for all of the questions are

displayed demonstrating the questions where the student did very well, adequately or

require immediate attention. In addition the instructors thought that the hyperlinks from

these overall test results display to the individual question results display was well-

designed offering personal feedback to the students.

168

Finally the instructors stated that the availability of both the MCQCM and the

Blackboard MCQ via the Internet is greatly beneficial to the conscientious student

wanting to improve their understanding of the subject material and that the opportunity

for self-assessment is taken advantage of by most of their students at least once during

the semester. However, they all stated that the demand for constructing MCQCM

questions is greater than required for MCQ, as more consideration has to be given to all

of the answer options as the strength of the MCQCM is reliant on providing multiple

correct options with no obvious distracters.

7.3 Concluding Observations of Comparison of MCQCM to

Traditional Computer Assessment

In general students appeared to appreciate the MCQCM tool as a valuable self-

assessment exercise, in particular the confirmation from most of the students that the

feedback was considered better than the traditional MCQ format and that it rated

equally as well as the MCQ in directing the students in their learning and informing

them of their understanding of the content. Given the time that students have been

exposed to Blackboards MCQ’s it is quite feasible that long-term exposure to the

MCQCM could result in an ongoing acceptance of it as a tool for revision.

Some students requested that we use the MCQCM for summative assessment, as they

considered it could be beneficial to be exposed to it during the semester as a formative

assessment tool in preparation for it to be part of the final exam.

The pleasing results of the first investigative study of using the MCQCM as a self-

assessment tool during the semester encouraged further studies in the field. The next

section of this research, Chapter 8, reports on a series of applications of the MCQCM

as a summative assessment tool, contributing to the final overall grade for the subjects.

7.4 Summary

This chapter has reported on the findings of implementing the MCQCM as a formative

assessment tool during the subject delivery as a means of student self assessment and

169

instructor reflection. It has achieved this by allowing the students to use the MCQCM

with other CAA strategies for comparison of scores and student’s and instructor’s

perceptions of the MCQCM. In doing so this chapter answers the research question

formulated in Chapter 2 pertaining to the application of assessment with confidence

measurement for formative assessment, being:

Research Question 1.

Does Assessment with Confidence Measurement produce more meaningful feedback

when used for formative assessment?

It achieves this by answering the following sub questions:


with confidence when used for formative assessment?

Q1C: Does the use of assessment with confidence measurement provide additional

valuable feedback to the instructor when used for formative assessment?

The analysis of the simulation scores (quantitative data) in comparison to the

previously achieved grades for a select number of students answers of the research sub

question:

Q1B: How do the students results compare to the results of a standard Multiple-choice

Question (MCQ) test when using assessment with confidence measurement for

formative assessment?

170

CHAPTER 8 USING THE WEB-BASED MCQCM FOR SUMMATIVE ASSESSMENT

Chapter 7 presented the results of using the MCQCM as a formative assessment tool.

This trial yielded encouraging outcomes. The pilot programs described in Chapter 5

and the consequential design changes in Chapter 6 greatly assisted in the construction

of a satisfactory Web-based self-assessment tool offering full flexibility, immediate

feedback and a seemingly more honest appraisal of the state of knowledge of the

participant. It was apparent that the MCQCM offered a non-threatening environment

that both the students and instructors considered to be beneficial. The question at this

time is whether the MCQCM is an acceptable assessment strategy for summative

assessment and whether it could possibly offer a more discerning set of results?

Importantly, it must first be ascertained if the MCQCM is a valid, legitimate

assessment option, offering a level of reliability of equivalence to that of the more

traditional methods of assessment. This chapter considers the observations and results

of a series of exercises initiated to ascertain if the MCQCM could be used as a

summative assessment tool, both from the student’s and the instructor’s perceptive.

171

8.1 Initial Trials using MCQCM as a Summative Assessment Tool

To facilitate this trial the MCQCM was used as a primary revision tool for a group of

students throughout a semester followed by an MCQCM class test. The test was graded

using the traditional method of one mark for a correct answer in comparison to a

grading depending on the user’s registered confidence.

This study initially considers the validity of the MCQCM testing method, where

validity refers to “whether the question actually tests what it is purported to test”

(Schuwirth & Van Der Vleuten, 2006), achieved by comparing the correlations between

two methods of testing that are supposed to measure the same construct (Bacon, 2003),

in this case the MCQCM results against the traditional MCQ test results. The reliability

of any testing method is defined as the accuracy of which a score on a test is

determined, or more precisely, a score that a student obtains should indicate the score

that this student would obtain in any other given, equally difficult test, in the same field

(Schuwirth & Van Der Vleuten, 2006).

8.1.1 Setting

A cohort of 52 Data Communication students, a mixture of undergraduate and

postgraduate students doing various programs, were required to sit a test that

contributed to their final grade during the semester. The test consisted of 10 MCQs

testing the students on the fundamentals of network design. The author of the test was

mindful of Bloom’s (1956) taxonomy of educational objectives when constructing the

questions to facilitate the assessment of various levels, in particular testing at the

application level. The students sat the test under supervision during the tutorials.

They were instructed that they would be graded in two ways. Firstly using the

MCQCM technique where the registered confidence for each response would be

included in the mark and secondly using the traditional method. They were instructed

that the grade allocated to them for this assessment task would be the greater of the two.

This was done to alleviate the stress experienced by the students using a new grading

172

system and to give richer data when asking the students questions about their

perception of the testing style.

8.1.2 Results

The proportion of undergraduate and postgraduate students was not noted but estimated

by the lecturer as being approximately half. The gender balance was not even as the

area is traditionally more popular with males, being in this case 84 per cent males and

16 per cent females.

This results section has been divided into two areas for clarification purposes. The first

considers the data generated from the scores. This data was gathered then statistically

analysed for correlation, validity and reliability and appropriate conclusions are drawn.

The second analysis of data gathered to gauge the students’ and instructors’ perception

of using the MCQCM as a summative assessment tool.

Results for Section 1: Grade Comparisons for Correlation and Validity

This section considers the grades for each of the marking systems followed by a

comparison of the results evaluating the convergence of validity, the correlation and

reliability.

The average results for the MCQCM and the traditional MCQ are summarised in Table

8-1.

Test type and

difference between the two

Average Grade Standard Deviation

MCQCM 67.60% 24.80%

MCQ 60.58% 22.73%

MCQCM - MCQ 7.02% 20.23%

Table 8.-1: Average, Standard Deviation and Difference for Both Marking

Schemes.

On examination of the analysis presented in Table 8-1 it is noted that the average

grades and the standard deviation for the two marking schemes are reasonably close. It

is observed that the MCQCM has the greater Average Grade and Standard Deviation.

173

Instructors would be quite pleased with these outcomes at this stage, as the results

appear to be acceptably convergent.

When looking at the grades in more detail it is noticed that the difference for the

individual’s test score in some cases is quite extensive. Figure 8-1 graphs the two

grades for each individual clustered by the grades for the MCQ. It demonstrates the

spread of the results, showing the grade for the MCQCM marking scheme in some

cases being quite different than that of the gathered MCQ scheme.

Figure 8-1: MCQ and MCQCM Scores for Each Student with the MCQ

Clustered.

It is further observed that there is a relatively even distribution of those who benefited

from the MCQCM marking scheme (42%) and those who benefited from the MCQ

marking scheme (39%), while the remaining 19 per cent achieved the same mark. In

fact, further investigation found that of those who obtained a high score in the

traditional MCQ marking scheme (>65) only 32 per cent scored higher for the

MCQCM marking scheme with 52 per cent scoring lower and 16 per cent scoring the

same. Further, of the 10 students who achieved 90 per cent or more for the MCQ

grading scheme 6 of them scored less, 0 scored higher and 4 scored the same for the

MCQCM. This is an important observation, as the higher achieving students do not

necessarily score better using MCQCM. This suggests that the MCQCM might be a

better indicator of knowledge, in particular for those students who achieve higher

174

grades, but is in no way conclusive. However, it could also represent the level of

confidence they are prepared to register.

The result for the convergence validity for the correlation between the MCQCM and

MCQ scores supporting the hypothesis that there exists a correlation between the grade

for the MCQCM and the grade for the MCQ. It is apparent that there is a relatively

strong convergence of correlation of the two marking schemes as shown in Table 8-2.

Correlations

MCQCM MCQ

MCQCM Pearson Correlation 1 .629

(**)

Sig. (2-tailed) .000

N 52 52

** Correlation is significant at the 0.01 level (2-tailed).

Table 8-2: The Correlation for the Two Marking Schemes

This analysis confirms that there is convergence of validity for the MCQCM and MCQ,

with the correlation of .629 (p<.01). This result gains strength when considering the

calculated value of Cronbach’s Alpha reliability coefficient (.722, above the

recommended minimum of .70) for this set of results, demonstrating internal

consistency.

While the correlation supports both convergence of reliability, offering validation of the

usage of MCQCM as an alternative assessment task, it is by no means conclusive, as it

requires extensive further research to truly validate the hypothesis. The observed

possibility of the MCQCM offering a more discerning grading system also warrants

further investigation.

Results Section 2: Students’ Perception of using MCQCM for Summative

Assessment

The second component of the results concentrates on the students’ perception of

MCQCM as a summative assessment tool. This section deals with the responses of the

students during the support post-test surveys in the tutorials in an attempt to ascertain

how enthusiastic the students were towards the MCQCM (Appendix A). Most

importantly it attempts to evaluate their perception of how much control they felt with

175

regards to using the slide bar with the direct consequences of their actions being a

possible change in scores. It was considered important at this stage of the research that

there be an understanding of the level of students interactivity with the system. The

amount of the use of the slide bar and the reason for using the slide bar needed to be

ascertained to ensure that the cognitive process behind the decisional making activity

was representative of the student’s current state of knowledge.

It was observed that 70 per cent of students declared that they gained from being able to

use the slide bar to show their confidence while the remaining 30 per cent did not

consider it to offer any advantages. This was a pleasing result with a large majority of

those who acknowledged the gain supported their choice, stating that the MCQCM

permitted the attainment of marks for partial knowledge, identified problem areas to

both the marker and the student for further study and offered a good comfort zone if

you were unsure of the answer. The 30 per cent of students who did not feel that they

gained from the use of the slide bar commented that they considered that the gain was

not worth the effort, as well as being confusing and too difficult to use.

Further discussion found that 40 per cent of the group used the slide bar only as a

means of identifying the answer as being true or false while 60 per cent used it to

identify the answer as being true or false and register their confidence. This is an

interesting observation, as the students who acknowledged using the slide bar only as

registering their True/False choice is greater than those who did not feel that they

gained from the slide bar. Considering the supportive student dialogue it is apparent

that some of the students felt that even though they appreciated the option for using the

slide bar they did not use it under test conditions, as it was too taxing in the situation of

summative assessment as opposed to formative assessment.

Apart from a small group (6%) who used the slide bar the same for both the practice

formative assessment and the formal summative assessment, 47 per cent used the slide

bar less with equal proportion using it more. The students who used it less justified this

by saying that it was too difficult to use it extensively under test conditions due to the

extra cognitive load and it increased the stress level. They consider it to be a distraction,

not considering the gain worth the effort. In one case a student stated that they do not

like to gamble and would rather just choose an option outright. The students who used

176

it more during the summative assessment than for formative assessment stated that they

wanted to maximise their grade by minimising the loss of marks for answers they were

unsure of or alternatively increase their grade when sure of the answers.

Previous work of Farrell and Leung (2003) discusses the different learning approaches

of the individual and the advantages of offering assessment variations that consider the

personality traits of the users, in particular the introverted and extroverted users of the

system. A strong 72 per cent of the cohort registered that they are comfortable

registering 100 per cent for an answer if they are certain that it is correct, while the

remaining 26 per cent find it difficult to claim 100 per cent confidence even when they

are sure of the answer. This suggests that in a selected cohort of students it would be

expected that there would be some of them who would prefer to register a level of

confidence less than 100 per cent in a choice even if they are absolutely certain of the

answer.

A large 75 per cent of the group confirmed that they appreciated the opportunity to gain

some marks for partial knowledge. They further agreed that the system forces them to

think more carefully about their options. The remaining 26 per cent felt that they are

often either “know it or you don’t” and registering a confidence is just not committing.

8.1.3 Discussions and Conclusions

In conclusion, the results of the grade comparison component of this study has

identified a convergence of validity between the two types of grading schemes being

investigated, Multiple-choice Questions with Confidence Measurement (MCQCM) and

the traditional Multiple-choice Question (MCQ) format, for the subject of Information

Technology. Consequently the MCQCM appears to be an acceptable option to be

included in the suite of assessment tools available to the instructor. Previous work

(Farrell and Leung, 2002) has demonstrated that the MCQCM delivers a richer

feedback and guidance to the students when used as a formative assessment tool. In

addition they documented the perceived advantages of using it in preparation for exams

from the students’ point of view.

It is pleasing to observe that the grades do correlate and there appears to be an

interesting interaction with the upper achievers where the difference in the grades for

177

the MCQCM and the MCQ alternative could offer a richer grading system. Although

the evidence is not over-whelming, it is an interesting observation that the higher

achievers in the group do not score as well for the MCQCM. This could either be that

MCQCM forces the students to “show their hand” giving a true indication of their

knowledge or that it is really acting as a statement of their own personal confidence in

their choices. In light of this ongoing application it was recommended that the

MCQCM be adopted to increase the data gathered, investigating to see if this observed

results occur again and if so attempt to ascertain the reason.

The second set of results from the student survey revealed some interesting

observations. There was an overall support of the majority from the students that was

pleasing to the developer and instructor. The majority of students acknowledged the

benefits of the system, stating that they appreciated the opportunity to demonstrate

partial knowledge and optimise their grade by lessening the impact of an incorrect

choice and increasing the grade for a correct one. The confirmation that a proportion of

the cohort, 26 per cent in this case, do not have the confidence to register 100per cent

for any answer, even when they know it is absolutely correct, should be always

considered in the analysis of future observations, as it indicates the possibility that a

particular group of students will never be able to maximise their grade by using this

system. Another important observation is that 47 per cent of the students decrease the

amount they use the slide bar during summative assessment than when using it for

formative assessment during the semester, as they consider it to be a distraction or

perceive it not offer enough return on their assessment.

These encouraging results promoted the continuing utilisation of MCQCM as part of

the assessment tasks for the semester. The overall positive response from the students

towards MCQCM as both a formative and summative assessment tool increased the

enthusiasm of the designers and instructors who were keen to pursue its usage in the

classroom.

From the instructor’s position, an identified advantage of using the MCQCM as a

summative assessment tool was that it required the students to use it during the

semester for revision in preparation for the test. This apparent by-product of

introducing a new assessment strategy forces the students to actively revise the course

178

material as part of becoming familiar with the assessment strategy. The benefits of

placing a requirement of the students to know the MCQCM operation in most cases

greatly increases the likelihood of their success in the subject.

8.2 Comparative Analysis of using the MCQCM as a Summative

Assessment tool to the Traditional Short Answer, MCQ and Long

Answer Assessment

The question facing educators today is what methods of assessment should they be

using and what would be the appropriate mix to maximise the feedback and evaluation

process? Schuwirth and Van Der Vleuten (2006) consider a well designed assessment

strategy will incorporate various types of questions that are appropriate for the content

being assessed (Schuwirth & Van Der Vleuten, 2006). The options presently available

to the instructors include multiple-choice questions (MCQ), short answer questions

(SA), longer problem solving questions (PS), case study reports, presentations and

other equally effective and proven choices. In the majority of cases the final grade is

calculated by combining each separate mark from assessment tasks completed during

the subject. The utilisation of multiple assessment methods recognises the need to allow

students to demonstrate their knowledge in various methods throughout their learning

experience.

As previously identified MCQs are highly regarded by instructors (Bacon, 2003) and

consequently used extensively, with world wide experience in their construction

(Schuwirth & Van Der Vleuten, 2006). The Short Answer assessment format has equal

popularity as the MCQ alternative. Short answer assessment strategies can offer more

flexibility, with greater ability to test creativity and higher levels of Bloom’s (1956)

taxonomy of educational objectives, as outlined previously. However, short answers are

resource intensive when grading and are subject to poor reliability due to subjective

marking (Bacon, 2003).

The longer Problem Solving questions are often included in the final exam as they

permit the instructor to assess the highest of Bloom’s levels of taxonomy. The format

of these questions usually present the student with a scenario situation which requires

179

the student to call upon many aspects of the subject material to analyse, synthesise and

evaluate, offering alternatives in some situations. These are clearly more difficult to

grade consistently as there is often not a prescribed correct solution but a number of

equally valid alternatives.

The encouraging results of the previous study initiated further investigative work,

where it was recommended that an analysis be completed of the grades of students

using MCQCM be compared with the grades from more traditional modes of

assessment: Multiple-choice Questions, Short Answers and Problem Solving (Scenario)

Questions.

8.2.1 Method of Comparative Study

Including MCQCM questions in the final end of semester exam facilitated this part of

the research. The exam also contained MCQ questions, Short Answer questions and

Longer Scenario questions. A total of 43 students sat the final exam producing some

interesting results which consisted of an 8 Multiple-choice Question (MCQ) section

followed by 8 MCQCMs, 8 Short Answer Questions and 2 Longer Problem Solving

questions. The MCQ and MCQCM sections carried 20 per cent each of the final exam

grade, the short answers section carried 33 per cent while the longer problem solving

section the remaining 47 per cent. The exam questions were constructed with an

awareness of Bloom’s (1956) taxonomy of educational objectives facilitating the

assessment of various levels from recall to application. On the completion of the exam

the results were collated with each question’s mark carefully recorded for analysis.

8.2.2 Results

The averages and standard deviations are displayed in Table 8-3.

Section Average Grade Standard Deviation MCQ 73% 17.7%

MCQCM 67% 21.0%

Short Answers 85% 9.8%

Problem Solving 75% 14.5%

Table 8-3 :Means and Standard Deviations for Each of the Section of the Exam.

180

It can be observed in Table 8-3 that the average class grades for the various sections of

the paper are close, as too are most of the standard deviations. The short answers

section has the greater average grade with a smaller Standard Deviation. Instructors

would be quite pleased with these outcomes at this stage.

On further examination and analysis of the data it was found that in most cases there

appears to be a good relationship between the grades allocated for each of the sections

for the individual student. Again this is very pleasing for the instructor, as there appears

to be a good convergence for each of the assessment areas under consideration.

Educators rely on a reasonable convergence of the grades for each of the sections, as

any deviation from this is an area of concern. Failure to achieve this might indicate

poor question construction in a particular section. In this case there does not appear to

be any one area of concern.

At this stage analysis was applied to identify the statistical relationship between these

results. The correlation for the scores for each of the sections was used to test the

convergent validity, using Spearman’s Rank Order correlation test.

The comparative results are displayed in Table 8-4: Correlations

MCQ PS MCQCM

Spearman's rho PS Correlation Coefficient .235


N 43

MCQCM Correlation Coefficient .436(**) .302(*)

Sig. (2-tailed) .003 .049

N 43 43

SA Correlation Coefficient .447(**) .442(**) .544(**)

Sig. (2-tailed) .003 .003 .000

N 43 43 43

** Correlation is significant at the 0.01 level (2-tailed).

* Correlation is significant at the 0.05 level (2-tailed).

Table 8-4: Correlation Table for the Sections of the Exam.

The following observations can now be discussed.

Firstly, let us consider the correlation between the MCQCM and the other sections of

the exam paper.

181

There is a reasonably strong correlation between the MCQCM and the SA section

(r=.544, n=43, p<.01).

MCQCM also has a medium correlation with MCQ and PS (r=.436, n=43, p<.01 and

r=.302, n=43, p<.05) respectively.

These statistics confirm that there is a convergence of validity for the MCQCM and all

of the other sections of the exam. Additionally, these correlations gain strength when

considering the Cronbach’s Alpha reliability coefficient for the results, demonstrating

the internal consistency of .7, equal to the recommended minimum.

Further, it is interesting to see that the grades for the MCQ section demonstrate a

medium correlation to SA (r=.447, n=43, p<.01) and a small correlation to PS (r=.235,

n=43, p<05).

However SA and PS has a stronger correlation (r=.442, n=43, p<.01).


In conclusion, this study has identified a convergence of validity between MCQCM and

all of the other sections of the exam paper, with the strongest correlation being between

MCQCM and short answers. This observation is very encouraging as the MCQCM was

primarily designed as a formative assessment tool to support the learner along the

learning path (Farrell & Leung, 2002b).

Interestingly, the traditional MCQ section of the paper has medium correlation with the

short answers and smaller correlation to the problem solving section. Hence, whilst

there is convergence of validity between MCQ and short answers there is no significant

convergence of validity between the MCQ section and the problem solving section.

This means that a good performance in either the MCQ or problem solving section

would not necessarily predict a good performance in the other.

As a result of these initial observations MCQCM appears to be a valid assessment

option, producing grades that have equal reliability as the more traditional methods of

assessment. However, MCQCM does not appear to offer any great advantage over the

rest of the methods of summative assessment. The question then must be asked is the

MCQCM a worthwhile strategy for summative assessment?

182

This study encouraged the utilisation of the MCQCM as a summative testing option for

future investigation. It was proposed that the tool continued to be used as a both

formative assessment method for the duration of the semester and for summative

assessment, to be included as part of the final exam permitting further investigation to

ascertain the students’ acceptance or rejection of MCQCM as a standard method for

summative assessment.

8.3 Comparative Analysis of using the MCQCM and Traditional

MCQ as a Summative Assessment Tool

As yet unpublished findings of Farrell and Leung provide more evidence of the merit of

the MCQCM as a summative assessment tool in another field study. In this instance a

cohort of 86 students were required to complete a mid-semester test which incorporated

both MCQCM and MCQ questions.

8.3.1 Method

One of the main challenges of having students use the MCQCM in a controlled

environment is that you cannot ethically advantage or disadvantage a particular group

of students during their studies by setting up a control group for students from the same

cohort. In order to do a comparative study it is recommended that this be emulated by

providing variations in the structure of the test. In this case a cohort of 85 students was

evenly split into two groups. The first groups test consisted of questions 1 to 10 in the

MCQCM format and questions 11 to 20 in the traditional MCQ format. Alternatively,

the second group of students’ tests reversed this arrangement, with the questions 1 to 10

being the MCQ’s and 11 to 20 being the MCQCM’s.

8.3.2 Results

This combination of splitting the cohort of 85 students into two similarly sized groups

provided an interesting set of data for analysis, both for comparison of each group of

students and then across the groups. Figure 8-2 demonstrates the spread of the final

183

results for the students with the students being clustered around their MCQ result from

highest down to lowest. The corresponding individual’s MCQCM result is then plotted

against them for direct comparison.

Figure 8-2: The Student’s MCQ (clustered ascending order) and MCQCM Scores.

An interesting observation is the variation of the results for the upper and lower

achievers. It is noted that 89 per cent of the students whose MCQ result lie in the upper

region (> 80%) achieved less for their corresponding MCQCM score, while 11 per cent

scored higher. In direct contrast, only 35 per cent of the students whose total MCQ

result was in the lower quartile (< 50%) achieved less for their corresponding MCQCM

score and 65 per cent achieved a higher score. This is a telling observation as it

demonstrates a trend worth investigating.

The statistical analysis of the data, as seen in Table 8-5, further demonstrates that the

MCQCM results have a convergence to the traditional MCQ results, which validates

the use of it for summative assessment.

As in previous field trials, the convergence of validity between the results for the MCQ

and MCQCM sections of the tests is acceptable (r= .761, n= 85, p< .01), reinforcing the

validation of the MCQCM as being a reliable testing method. Again, this correlation is

stronger when considering Cronbach’s Alpha reliability coefficient, demonstrating

internal consistency, of .856, comfortably above the recommended minimum of .7.

184

Correlations

MCQ MCQCM

Pearson Correlation 1 .761**


MCQ

N 85 85 Pearson Correlation .761** 1


MCQCM

N 85 85 **. Correlation is significant at the 0.01 level (2-tailed).

Table 8-5 Correlation of MCQ with MCQCM.

This statistical analysis across the groups confirmed the previously observations,

confirming the MCQCM as an acceptable testing option that is statistically as reliable

as the traditional MCQ option.

Further, it is noticed that Chi-Squared test, shown in Table 8-6, confirms these results

for the same groups of questions to be of the same population, again reinforcing the

legitimacy of the application.

Test Statistics

MCQ MCQCM Chi-Square 33.518a 54.412b

df 45 9

Asymp. Sig. .896 .000

a. 46 cells (100.0%) have expected frequencies less than 5. The minimum expected cell frequency is 1.8. b. 0 cells (.0%) have expected frequencies less than 5. The minimum expected cell frequency is 8.5.

Table 8-6: Chi-Square MCQ to MCQCM.

185


The observations and analysis contained in Table 8-5 and Table 8-6 confirm the

MCQCM to have equivalent merit as the traditional MCQ assessment option, offering

good correlation and convergence. The results identify the MCQCM as an alternative

assessment strategy offering reliability, validity and convergence to standard

assessment policies.

8.4 Instructor’s Focus Group for Formative Assessment

As for formative assessment, a focus group was facilitated to ascertain the instructor’s

attitudes and perceptions of the MCQCM for summative assessment. The four

instructors involved in the above MCQCM implementations were invited to give their

feedback to the designers yielding the following observations.

All of the instructors complimented the visual presentation of both the MCQCM

question format and the student feedback screens. Consistent with the instructor’s

feedback for the formative assessment applications the instructors appreciated the

analogy to the games environment and the clarity of the feedback produced for the

students. They also complimented the MCQCM in its method of handling graphics and

the graphs produced showing the areas of comprehension and misunderstandings.

In particular the instructors appreciated the increased distribution of the student’s scores

as they felt that the MCQCM could offer greater opportunity to produce a more

discerning set of results. However, they voiced concern that the results might only serve

to reflect the propensity of the student towards over or understating their confidence

rather than truly identify the level of knowledge.

The main concern of the instructors was the implementation of assessment that

penalises the students with negative scores. While they all acknowledge that the

approach was reasonable and that there is a need to highlight areas of knowledge where

a student registers high confidence in incorrect answers, they feared that the students

could perceive the practice as being unfair, resulting in students questioning the scoring

and appealing for unjust assessment strategies.

186

All of the instructors felt that the implementation of an innovative assessment tool has

the additional benefit of forcing the students to revise before the final assessment,

where they are required to practice with the MCQCM in preparation. As a result

students who would not normally prepare may do so, improving their understanding

and hopefully catching those students who would normally slip through.

8.5 Discussion

The first trial discussed in this chapter was initiated to validate the MCQCM as a

summative assessment tool, ascertaining the acceptance of the MCQCM by the students

and the instructor as part of the assessment regime. As the results of this exercise

demonstrate from the survey (Appendix A), the majority of students considered the

MCQCM to offer greater opportunity to optimise their score, by lessening the impact of

an incorrect answer and increasing the reward for a correct one. Additionally, they

appreciated the chance to demonstrate some partial knowledge in areas, a feature that is

not possible in many traditional testing methods. The survey results also demonstrated

that for the majority the use of confidence measurement in assessment is a means of

identifying gaps in their learning and furthermore offered a comfort zone if unsure of

an answer. The statistical analysis of the results shows a good correlation, which should

be expected from an exercise where dual marking has been applied. The interesting

observation is in the upper quartile of student results where the higher achievers do not

score as well when the MCQCM scoring system is applied possibly offering a more

discerning set of results than just being clustered around the 90 to 100 per cent area.

The decrease in the use of the slide bar for summative assessment tasks demonstrates

the change in behaviour when there is more to lose or gain. This issue highlights the

need for the MCQCM to be extensively used throughout the semester so that the

students can familiarise themselves with the system.

The second trial considered in this chapter is the comparison of the MCQCM results to

a number of traditional testing methods as part of the subject’s final exam, The

statistical analysis of the results of the MCQCM section of the exam against those

attained for each individuals achievement for other traditional methods of assessment

187

returned some positive observations. From this study it is apparent that the MCQCM is

as reliable a testing method as the similarly constructed traditional MCQ method. The

convergence of the MCQCM and MCQ results is strong, offering an acceptable

reliability. The main observation is that the MCQCM showed similar convergence to

all of the assessment methods tested, equivalent to the levels of convergence of each of

the traditional methods of MCQ, Short Answers and Long answers showed to each

other. This validates the MCQCM as an equivalent predictor of a student’s knowledge.

Importantly, the feedback from the students endorsed the benefit of using the MCQCM

as a revision tool in preparation for the final assessment, exposing the students to

critical self-assessment of their knowledge, which they acknowledged as greatly

assisting them in preparation for the final exam.

The third trial outlined in this chapter, in which the MCQCM was used as an

assessment for half of the test and the traditional MCQ being used for the remainder,

again demonstrated a convergence of results and acceptable reliability. It was pleasing

to see that the results statistically came from the same population, confirmed by the

Chi-Squared test confirmed. However, the main contribution to this research can be

observed in the results of the upper and lower quartiles, where the MCQCM scores for

the higher MCQ scores are in the majority lower than their corresponding MCQ scores,

and equally important, the MCQCM scores of the lower MCQ scores are in the

majority higher.

8.6 Summary

This chapter has reported on the findings of implementing the MCQCM as a

summative assessment tool during the subject delivery as a means of contributing to the

grading of the student, formally recognising their level of achievement for the given

subject(s). It has achieved this by including the MCQCM into the summative

assessment suite along with other traditional assessment strategies. This permitted the

comparison of the results to ascertain if the MCQCM scores offered the reliability and

validity of the other assessment strategies used. In doing so this chapter answers the

188

research question formulated in Chapter 2 pertaining to the application of assessment

with confidence measurement for summative assessment purposes, being:

Research Question 2.

Does Assessment with Confidence Measurement offer equivalent Validity and

Reliability compared to traditional assessment strategies when used for

Summative assessment?

It achieves this by answering the following sub question designed to ascertain the

instructors’ and students’ perception of assessment with confidence measurement:


with confidence measurement when used for summative assessment?

It further addresses the sub questions regarding the validity and reliability of

assessment with confidence measurement:

Q2B: How do the results compare in validity and reliability to the results of the

standard MCQ test when using assessment with confidence measurement for

summative assessment?

Q2C: How do the results when using assessment with confidence measurement for

summative assessment compare in Validity and Reliability to other traditional

methods of summative assessment?

The final chapter, Chapter 9, discusses at length the findings of this thesis. It also

recapitulates the work contained in previous chapters, including the literature review,

the research methodology framework, the design and redesigning of the MCQCM, the

association of the MCCM to the game play topology and the mathematical foundation

by which the MCQCM scoring method was based.

189

CHAPTER 9

SUMMARY,

CONCLUSION AND

FUTURE WORK

The findings of this research are summarised in this chapter, with its significance and

contribution to the field. It achieves this by initially recapitulating on the work

contained in the previous chapters followed by discussion on the findings of the trials in

the field. It also addresses the research questions posed in Chapter 2. The chapter

closes with the conclusions drawn, identified limitations and proposes possible further

direction for investigation.

190

9.1 Summary of the Research

This research was initiated to address the identified area of concern voiced by many

educators, that assessment which encourages guessing, and in most case rewards a

student for it, falls short of its primary objective of reflecting the student’s present level

of knowledge and providing meaningful feedback encouraging self-reflection and

consequential adjustment to the learning path. In addressing this fundamental concern it

also tackles the additional identified concern of reinforcing incorrect knowledge in the

students mind and failing to recognise partial knowledge. As a means of achieving a

solution to these areas of concern this research investigates and evaluates the option of

incorporating confidence measurement into the assessment strategy to improve the

grading of the students’ knowledge and feedback. This research then proposes the

Multiple-choice Questions with Confidence Measurement (MCQCM) interactive Web-

based assessment tool as an alternative to the more traditional MCQ method of

assessment. It then further evaluates the utilisation of the MCQCM to ascertain if it is

beneficial to both the students and the instructor as an assessment strategy for

incorporation into the classroom activities. Initially it focuses on the MCQCM

application for formative assessment, then as a result of promising feedback extending

its application for summative assessment applications, addressing these areas of

application for formative and summative assessment separately.

This research contributes to the educational community with its evaluation and

endorsement of assessment with confidence measurement, in turn assisting educators

who wish to pursue it for inclusion in their assessment strategies. The discussion of the

alternative scoring techniques developed and refined by others who have implemented

similar approaches to assessment offer options to those instructors wishing to pursue

innovative assessment strategies. These documented scoring methods differ

significantly depending on the various cohorts of students, recognising the students’

limitations or taking full advantage of their advanced academic abilities, depending on

the cohort’s different socioeconomic, cultural and intellectual backgrounds. It is the

191

responsibility of the educators to provide meaningful assessment that identifies the

student’s level of achievement as precisely as possible. Some educators consider

assessment that fails to truly indicate a student’s true level of knowledge as bordering

on negligence on the instructor’s behalf. Others vehemently condemn the use of

negative assessment in any shape or form, asserting that the practice can produce

adverse affects on the individual, detrimental to their progress along the learning path.

This research demonstrates a broad acceptance of assessment with confidence

measurement by both students and instructors.

This research paves the way for further studies in the area of innovative assessment

with confidence measurement. It is envisaged that future incarnations of the MCQCM

will include provision for variable scoring regimes, the same that have been practiced

by others, offering flexibility in its applications. These scoring alternatives could either

permit the setting of the scoring choice for the duration of the subject’s delivery or be

adjusted by the instructor in an attempt to increase the intensity, applying more severe

penalties to students who demonstrate high levels of confidence for incorrect answers

towards the end of the delivery schedule. The gaming metaphor adopted for the

interactive application produces intrinsic motivational activity, encouraging practices

and rehearsal through casual use, supporting the transference knowledge from short-

term to long-term memory, which is fundamental to students learning. In keeping with

the games phenomenon, future development will continue to preserve the features of

game play, promoting fairness, posing challenges with appropriate rewards,

encouraging risk-taking with explorative activity and controlling the level of difficulty

and the corresponding stress. Whether used for formative assessment alone or

implemented for summative purposes that require formative assessment activity to

support it, this research identifies the advantages of assessment with confidence

measurement, exposing the student to a self-critical process that increases the

likelihood of correct indication of their level of knowledge, of the benefit to both the

instructor and the students.

This research culminates in a series of recommendations to be considered if embarking

on the use of a computer aided assessment strategy incorporating confidence

192

measurement and importantly presents alternative methods of scoring that can be

considered.

9.2 Recapitulating on Previous Chapters

In Chapter 1 of this thesis, the research problem statement was formulated, based on the

discussion surrounding the identified concern that existing traditional assessment

methods, such as Multiple-choice Questions, often do not comply with the criteria of

good assessment practice by failing to indicate the true level of knowledge of the

participant. Furthermore, by the nature of their construction, many traditional

assessment strategies encourage the individual to guess, rewarding them for their

efforts. Chapter 1 also identified another major concern being that our traditional MCQ

assessment options often do not encourage the participant to demonstrate their various

levels of knowledge, as they require the student to identify the answer as being either

correct or incorrect (black or white), not permitting the student to demonstrate partial

knowledge, the “shades of grey” or “fuzzier” areas (Diamond & Forrester, 1983).

The choice of an appropriate Research Methodology plays a vital role in any research

direction and structure, as outlined in Chapter 2, where the research framework was

defined. It achieved this by firstly recognising the contributions of traditional research

paradigms often used by researchers, being Positivism, Interpretivism and Critical

Theory, focusing this research to achieve its objectives. In Chapter 2 the most

appropriate approach to address the research questions was then formulated, being a

blend of the afore mentioned traditional research methodologies in order to use both

quantitative and qualitative analysis of the generated data. At this time the problem

statement of Chapter 1 was used as a basis to construct the research questions. Chapter

2 then considered each of these research questions in light of the research paradigms

and identified the research framework for the research questions to be addressed.

Chapter 2 then closed with a discussion about the approach to problem solving in the

real world, adopting the Human Computer Interaction (HCI) iterative approach to

problem solving (Sharp et al., 2007).

193

The literature review in Chapter 3 discussed variations of non-conventional MCQ

assessment strategies that have been used in the past and at present. Initially this

Chapter discussed the various Learning Theories and Learning Styles, identifying the

importance of feedback in the learning process. It was then recognised that the

embracing technology provided the ideal environment for educators to pursue their

educational interests, developing and refining innovative assessment practices to

greatly enhance the learning, encouraged by the perceived commercial opportunities

spurring on the rapid development in the field. Furthermore it was acknowledged that

the instructors today have many assessment options available to them and that it is the

responsibility of instructors to choose a combination of assessment tools and that it is

imperative that formative assessment activities are made available for the duration of

the learning experience. The chapter then discussed the utilisation of the MCQ

assessment method and its suitability to the new technology, often pushing its

application beyond the scope of its original design. The works of others in their attempt

to eliminate guessing and reward students for partial knowledge was discussed and the

MCQCM is reliant on the underlying core arguments provided in the cited previous

work. Chapter 3 continued by discussing the benefits of providing the opportunity for

learners to reflect on their present state of knowledge and the important role that

computer aided assessment learning tools have on the learning process, as outlined by

Hede’s (2002) model of Integrated Effectiveness of Multimedia on Learning.

Chapter 4 considers the mathematics underpinning various scoring systems adopted for

innovative assessment strategies. It initially addressed the issue of guessing in MCQ

assessment by introducing the work of Pollard (1985,1986,1993), a pioneer in the field,

creating a scoring method that penalised for incorrect choices. Another scoring

technique (Paul, 1994) that used confidence as a contributor to the student’s grade

where the scoring is based on a logarithmic relationship derived from probability and

game theory. The more recent work of Gardner-Medwin and Gahan (2003) and

Gardner-Medwin (2006) that uses a harsh penalty system for students who demonstrate

high confidence in an incorrect answer but offers a safe zone for students willing to

concede that they have very little knowledge in the area was presented. Chapter 4

closed with a discussion of the MCQCM scoring technique.

194

Chapter 5 summarised the development of the MCQCM prototype and the initial pilot

trials. Chapter 5 presented the outline of a small pilot program initiated in the early

stages of development followed by the analysis, summary and conclusions. The details

of a second, more extensive pilot program are then summarised in which the findings

are analysed and evaluated. Chapter 5 continued with a summary from the results of

these two pilot programs that received encouraging feedback providing inspiration for

further investigation and research. Chapter 5 identified elements of design and

functionality that needed addressing, in particular the major concern of operation of the

MCQCM, where the mechanism of identifying an option to be correct and declaring the

level of confidence is completed in one action. Chapter 5 addressed the first research

question: Does Assessment with Confidence Measurement produce more meaningful

feedback when used for formative assessment? This was achieved by answering

relevant sub questions for formative assessment formulated in Chapter 2 being: What

are the student’s attitudes and perceptions of assessment with confidence when used for

formative assessment?; How do the students’ results compare to the results of a

standard Multiple-choice Question (MCQ) test when using assessment with confidence

measurement for formative assessment? and Does the use of assessment with

confidence measurement provide additional valuable feedback to the instructor when

used for formative assessment?

Chapter 6 contained an extensive discussion on the redesigning and refinement of the

MCQCM, addressing many of the issues acknowledged in Chapter 5, while also

adhering to customised heuristics developed by Sim, Read and Holifield (2008).

Chapter 6 opened with an investigation into the games paradigm, then described the

fundamental elements that constitute a game play experience identifying the

components and the experiences that are required to produce a balanced game.

Following that it addressed these elements in light of the MCQCM application that uses

the metaphor of gaming, identifying the areas of goals, risk and reward, fairness,

challenges, learnablility, stress and level of difficulty. In addressing the problem of

using the one mechanism to identify an option to be correct while also declaring the

level of confidence Bandura’s (1983) self-efficacy work was called upon to formulate a

solution. The Chapter then continued by comparing the usability of the MCQCM

195

against Sim, Read and Holifield’s (2008) heuristics for Web-based assessment

applications, producing a set of extended heuristics for MCQ with confidence

measurement. Chapter 6 closed with a discussion about the MCQCMs unique solution

for handling graphics by considering the work of Leung (1995) in his application of

bifocal display for large display requirements on small screens. Chapter 6 answered

the third research question:



maximum benefit from its application?

Chapter 7 reported on the findings of an investigative study designed to evaluate the

MCQCM against a traditional computer based assessment package for a formative

assessment exercise. The Chapter initially outlined a small simulation exercise designed

to indicate if the MCQCM produces the results that it purports to, in preparation for

extended application in the field. Chapter 7 detailed the application of the MCQCM as

a formative assessment tool used for the duration of a semester, which returned

encouraging results after the analysis and summary of the data. Chapter 7 then

discussed the instructors’ contribution to the feedback as they stated that they felt

comfortable about making the MCQCM system available for student self-assessment

and furthermore appreciated the additional analytical feedback given, permitting them

to evaluate the effectiveness of their teaching strategies. Chapter 7 answered the second

research question:

Does Assessment with Confidence Measurement produce more meaningful feedback

when used for formative assessment? This was achieved by answering relevant sub

questions for formative assessment formulated in Chapter 2 being: What are the

student’s attitudes and perceptions of assessment with confidence when used for

formative assessment? ; How do the student’s results compare to the results of a

standard Multiple-choice Question (MCQ) test when using assessment with confidence

measurement for formative assessment? and Does the use of assessment with

confidence measurement provide additional valuable feedback to the instructor when

used for formative assessment?.

196

In Chapter 8 the application of the MCQCM for summative assessment was trialed

following the positive responses from the work in Chapter 7. The chapter discussed the

first application where the MCQCM was used for both formative and summative

assessment along with the traditional MCQ assessment strategy. The resulting grades

for each individual were compared and statistically analysed to reveal a convergence of

reliability.

The second study contained within Chapter 8 compared the individuals MCQ, Short

Answers, Long Answers of Problem Solving and the MCQCM grades from a cohort of

students. The statistical analysis concluded that the MCQCM had equal correlation to

that of all of the others and can be considered equally reliable. The final field study in

this chapter compared the individual’s MCQCM and MCQ results for a summative

assessment event but used a more reliable method for evaluation. It produced results

demonstrating a convergence of validity between the two testing methods and an

interesting set of result for the upper and lower quartile of students.

Chapter 8 addressed the second research question: Does Assessment with Confidence

Measurement offer at least equivalent Validity and Reliability compared to traditional

assessment strategies when used for Summative assessment? This was achieved by

answering relevant sub questions for summative assessment formulated in Chapter 2

being: What are the student’s and instructor’s attitudes and perceptions of assessment

with confidence measurement when used for summative assessment? , How do the

results compare in validity and reliability to the results of the standard MCQ test when

using assessment with confidence measurement for summative assessment? , How do

the results when using assessment with confidence measurement for summative

assessment compare in Validity and Reliability to other traditional methods of

summative assessment? and Does the use of assessment with confidence measurement

provide additional valuable feedback to the instructor when used for summative

assessment?

197

9.3 Discussion

This research positions itself in the centre of human activity as it attempts to address an

issue embedded in the educational arena. Consequently it primarily deals with human

players, being the students and instructors. As a result of being engulfed in a real world

phenomenon the identification of the relevant problem space with its parallel research

problem statement marries together the major contributors to this research, being

Human Computer Interaction (HCI), for the iterative approach to problem solving of

the problem space and components of Positivism and Interpretivism contributing to the

research framework. These distinctly different discipline approaches work in harmony

for this particular research activity, finding the balance between the theoretical and

practical approaches to solving a real world problem in context. The HCI iterative

approach encourages the development and refinement of the MCQCM tool conforming

to recognised HCI guidelines, best practices and recent contributions in the field from

Sim, Read and Holifield (2008). The blended research methodology approach permits

the research questions to be formulated at the beginning of each stage where the

activities are tailored to produce the correct data in an attempt to enlighten the

researchers. This works harmoniously with the HCI iterative activities.

This section will discuss and answer the research questions formulated as part of the

research framework identified in Chapter 2. It will further demonstrate the value of the

MCQCM as a solution to the overall encompassing problem that present assessment

strategies employed by educators often fail to accurately demonstrate a student’s

present level of knowledge in a given area, being detrimental to both the student and

the instructor.

9.3.1 MCQCM as a Valuable Formative Assessment Tool

Question 1, as formulated in Chapter 2 of this research, asks, Does Assessment with

Confidence Measurement produce more meaningful feedback when used for formative

assessment?

This research set about to answer this first question by breaking it down into smaller

sub questions, permitting a concentrated focus on the underlying concerns. Any

198

assessment strategy is reliant on the acceptance of both the instructor and the student.

As discussed in Chapter 6, the perception of fairness has a fundamental place in our

society, as unfair practice in all aspects of our orderly life is immediately rejected.

Consequently, the students and instructors perception of assessment with confidence

measurement is an integral part of this research.

The corresponding first sub question formulated was Q1A: What are the student’s and

instructor’s attitudes and perceptions of assessment with confidence when used for

formative assessment? The focus of the early stages of this research was designed to

address this issue and gauge the acceptance of assessment with confidence

measurement by the potential participants. As presented in Chapter 5, this was achieved

by a series of small pilot programs specifically designed to ascertain the overall

acceptance. The prototype version of the MCQCM was introduced to an initially

restricted cohort of students, then to an extended group. Although this version of the

MCQCM had limited functionality it was at an acceptable operational level for the

purpose. These initial pilot programs returned encouraging results, supporting the use

of the MCQCM as an assessment strategy demonstrating by the students’ responses that

assessment with confidence measurement has a positive contribution to self-

assessment.

The second sub question in support of the first research question outlined above that

this research has addressed is Q1B: How do the students results compare to the results

of a standard Multiple-choice Question (MCQ) test when using assessment with

confidence measurement for formative assessment? This question is answered in

Chapter 7 with the application of the MCQCM for formative assessment, where Section

7.3 demonstrates the MCQCM to be equally valid with traditional assessment and

offering a reliable assessment, with a convergence to the results from using other

traditional testing options. Additionally, the results of the simulation exercise, where a

select number of students used the MCQCM after receiving their final grade for the

semester by other various summative assessment, demonstrated the MCQCM

consistently represented the equivalent level of knowledge compared to those recorded

from the individual’s other summative activities.

199

The third sub question in support of the overriding first research question regarding the

perceived value of formative assessment with confidence measurement is Q1C: Does

the use of assessment with confidence measurement provide additional valuable

feedback to the instructor when used for formative assessment? Assessment strategies

have two main stakeholders, students and instructors. It is not sufficient to only

consider the opinions of the students as the true value of an approach to assessment is

reliant on its worth to all interested parties, in this case the instructors. In Chapter 5 the

pilot programs produced some interesting findings relevant to the instructors when

considering the application of assessment with confidence measurement for formative

assessment. The resulting analysis revealed that instructors appreciated the increased

distribution of marks, offering a richer indication of the areas of attained knowledge.

Important to the instructors was the ability for them to identify areas in which a large

number of students demonstrate high levels of confidence for incorrect answers,

possibly indicating poor understanding in particular areas of the content. When

instructors were confronted with this type of data they recognised the need to re-

evaluate their teaching program to address areas of that appear to be misunderstood, or

more importantly, a demonstration of a strong belief in incorrect facts. During the first

pilot program the instructors confirmed that the opportunity to review the data as

presented by the system would permit them to review the effectiveness of their teaching

strategies in the highlighted areas of concern. They also voiced their concerns on the

possible negative effect of the use of assessment that applied negative grades for

incorrect answers, as the approach was foreign to them and they were unsure of the

reception they would receive from the students when proposing such assessment. In

addition they registered reservations about any assessment that is dependant on the

individual registering their level of confidence to gain or lose marks, as the practice

could favour the more extraverted student, who is somewhat brash in their approach to

their level of knowledge and perhaps tending to be over-confident. Correspondingly,

possibly handicapping the introverted, modest or less confident individual who might

understate their level of knowledge.

It was conclusive from the studies in Chapters 5 and 7 that all stakeholders, students

and instructors consider the MCQCM or in general the delivery of MCQ with a

200

confidence measurement to produce more meaningful feedback when used for

formative assessment.

9.3.2 MCQCM as a Summative Assessment Tool

Question 2, as formulated in Chapter 2 of this research, asks, Does Assessment with

Confidence Measurement offer at least equivalent Validity and Reliability compared to

traditional assessment strategies when used for Summative assessment?

Similarly to the approach for first research question, this research addresses this

question by formulating four sub questions.

As previously stated, the instructor’s and student’s perception of an assessment method

is critical to its acceptance. In the case of summative assessment this is of greater

concern. No instructor will use a system unless it conforms to both their own and the

institutions ethos of fair, unbiased assessment. Likewise, students will not support

assessment strategies they consider to unfairly prejudice or handicap individuals. As

Sim, Read and Holifield (2008) stated that ultimately the students are the one’s that

have the most to lose.

The resulting first of these sub questions is Q2A: What are the student’s and

instructor’s attitudes and perceptions of assessment with confidence measurement

when used for summative assessment? This research ascertained both the student’s and

instructor’s perception of summative assessment with confidence measurement by

using the MCQCM for various summative assessment activities. Chapter 8 documents

three distinct applications of the MCQCM as either stand-alone assessment or a

contributor as part of a suite of assessment activities. The feedback from the students

and the instructors after the first of the activities was analysed, as is presented in

Chapter 8. The ability to control the gain or loss via the level of confidence was

appreciated by the majority of students. It was pleasing to observe that some students

welcomed the opportunity to demonstrate partial knowledge, being rewarded

accordingly, while controlling the level of penalty for incorrect answers. It must be

noted that not all felt this way, as some expressed the opinion that being required to

nominate a level of confidence was interfering to the primary objective, a distraction

and an additional burden, not at all appropriate during formal test conditions. Others

201

also voiced hesitation in the declaration of absolute confidence (100%), as they prefer

to understate their level, in contrast to those who brashly overstate as part of their

personality. The instructors involved in this first summative assessment exercise

returned similar feedback to that recorded for the formative assessment, in that they

appreciated the results display highlighting areas of content misunderstanding

especially with high levels of confidence for incorrect answers, and furthermore

declared that this would influence their instructional direction resulting in the revisiting

of these content areas. The instructors continued to record concerns about the tendency

for the assessment with confidence measurement to be biased towards extraverted

students and possibly handicap the introverted. They also have reservations in using

implementing the MCQCM as the difficulty of constructing questions that offer

multiple correct answers for the one question is challenging, well beyond the demand

of traditional MCQ tests.

Q2B: How do the results compare in validity and reliability to those of the standard

MCQ test when using assessment with confidence measurement for summative

assessment? The MCQCM summative assessment implementations outlined in Chapter

8 produced statistical analysis of the recorded individual scores for MCQ and MCQCM

on three separate occasions using various mechanisms for comparative studies. In all

cases the results were pleasing as significant convergence of the MCQ and MCQCM

results, strengthened by the Cronbach’s Alpha Reliability Coefficient, which

demonstrated that the MCQCM scoring method consistently reflected the student’s

knowledge across the two assessments. The final case outlined in Chapter 8 directly

compared the MCQCM to the MCQ testing procedure, returning a Chi-Squared result

confirming that the two groups of MCQCM and MCQ scores were from the same

population. This evidence substantiates Q2B.

Q2C: How do the results when using assessment with confidence measurement for

summative assessment compare in Validity and Reliability to other traditional methods

of summative assessment? To answer this question Chapter 8 embarked on comparative

studies, where students were required to complete a number of various assessment

tasks, one of them incorporating assessment with confidence measurement. The

analysis demonstrated that the MCQCM was equally reliable to the other traditional

202

assessment strategies employed, proving it to have equivalent validity as a predictor of

student’s knowledge. This is demonstrated by the correlations calculated in Chapter 8

where convergence of results of the MCQCM to all of the other assessment methods

are close to those to each other, and in some cases stronger. This is shown where the

correlation between MCQCM and the short answers is the greatest. This study

concluded that the MCQCM compared equally to other traditional methods for

summative assessment.

Q2D Does the use of assessment with confidence measurement provide additional

valuable feedback to the instructor when used for summative assessment? As in the

discussion above the findings for this area of the research were constantly pleasing as

the instructors confirmed their general positive appraisal of assessment with confidence

measurement. The recurring theme from the instructors of the opportunity to identify

areas where high levels of confidence were being registered for incorrect answers

would “set off alarm bells”, resulting in adjustments to the presentation schedule to

revisit that content area. This was of particular importance for the first cohort of

students as the assessment was during the semester permitting the instructor to address

the identified shortcomings during the present students study program, which did occur.

In later instances of the MCQCM application in this research the benefits could only be

for the next cohort of students as it was used as part of the final assessment. The results

concluded that the MCQCM provided additional valuable feedback for the assessor for

summative assessment, however often the benefit would be for successive students

only.

From the results of the four sub questions it can be concluded that in response to

question 2 pertaining to the MCQCM being of at least equal value to students and

instructors and of consequential value to future students through improved feedback of

identification of gaps in knowledge.

203

9.4 Ethical Issues

This research is reliant on the data gathered from the real world, as well as

contributions from simulations and pilot programs more aligned with laboratory

experiments during the early design stages. Consequently consideration must be given

to the associated individuals. It is usual practice for assessment strategies to be analysed

post results for the identification of questions that were consistently answered well or

inadequately. Instructors often statistically process their students’ scores in an attempt

to ascertain if there are areas of content requiring attention, or most importantly if the

construction of particular questions are sub standard, needing rewriting for future use. It

is also expected that post assessment activity permits the students and instructors to

give feedback after the event, where the instructor will often go over the test results

highlighting the sections done poorly and those where the students achieved high

scores. Included in this exercise is the opportunity for the student to give feedback on

the question’s construction, possible ambiguous or poorly worded questions, which

may be misinterpreted by the candidates. This research has leveraged off these standard

teaching experiences, processing the data responsibly and privately as part of the

standard operational procedures of the daily educational experience.

9.5 Limitations of Study

9.5.1 Scope

Educational research can often suffer from the enormity of its application and its area

of relevance. Assessment is a major contributor to the educational arena, having far

reaching implications when considering innovative approaches. Combining this with

the significance of the e-learning paradigm where many of the substantial changes in

the approach to education are being generated there is a tendency to over commit to the

area of application, risking fragmentation of the primary research objectives.

Understandably researchers in this area need to restrict their approach in order to keep

the path clear. For this reason the scope of the research was limited to the immediate

environment in which it was developed and nurtured. This allowed the control of many

204

extraneous variables, minimising the interference and distractions that can occur when

extending the research beyond the immediate environment.

9.5.2 Internal Validity

Research that interacts with participants in a real world application is often influenced

by inherent factors from that environment. In the case of investigation into innovative

assessment strategies in the educational arena the participants often are unduly

influenced by the surrounding activities, as extra attention to assessment can artificially

increase the students level of appreciation, and as their familiarity grows and the

activities become more prevalent their reactions could be greatly overstated. This is

often referred to as the Hawthorn Effect, as demonstrated by Landsberger (1958),

where the subjects are observed to change their behaviour as a direct result of being

studied (Landsberger, 1958). In addition the introduction of new technology in support

of innovative educational activities often receives disproportional emphases to ensure

its general acceptance. The extended testing and consequential long-term exposure to

innovative assessment tools can result in the lessening of the participant alertness and

motivation towards the system.

9.5.3 External Validity, Transferability

External Validity in quantitative research is gauging the capacity of generalising from

the findings of the primary research to the extended environment. Transferability in

qualitative research refers to the ability to apply and transfer the methods and practices

formulated in the primary research to similar situations, environments and

circumstances (Lincoln & Guba, 1985). These two approaches can be considered for

this research as it has used both qualitative and quantitative research methodologies, as

discussed in Chapter 2.

The pilot programs, implementations and evaluations contained within this research

were dependent on the participation of students and instructors from various

socioeconomic groups, cultural communities and age groups. These groups bring with

them different attitudes, prejudices and preferences. All of these differences affect the

205

acquired data, as they make up part of the personalities and beliefs of the contributing

individual. These extraneous variables have had some influence on this research but

their effect has been minimal due to the relatively homogeneous group being studied.

This can be attributed to the structure of the limited environment where the study

occurred, confined to the one faculty for its duration. It would be expected that similar

research in other discipline areas could vary greatly as the attitudes and diversity of

student populations and instructors add to the recorded feedback and experiences with

the system. Such differences were demonstrated by Davies (2005) with

recommendations in rejecting severe penalties in favour of over-rewarding students for

high confidence in the correct answer for his cohort of students, or Paul (1994) who

found it difficult to apply severe negative penalties due to rejection from his fellow

instructors. Consequently, the ability to generalise from these studies becomes limited,

as acceptance and perceived value to both the instructors and the students can vary

greatly depending on their environments.

9.5.4 Construct Validity

Construct validity considers the degree to which the results can be generalised back to

the theoretical construct originally determined. The perception of fairness and

contributing value by both the student and the instructor has different meanings to

different people. This research deliberately omitted to predefine these terms, and others

like them, so as not to influence the participant’s perception. The exercises designed to

evaluate assessment with confidence, permitted the participants to freely express their

opinions without influence. Due to the ambiguity of these terms any questions were

phrased deliberately so as not to prescribe a definition. This format of non-defined

terminology encouraged personal interpretation and expression of the value of

assessment with confidence measurement. Consequently the results have no bias when

used to generalise on its area of origin, in that the findings should comfortably

represent the attitudes and perceptions of other cohorts of students from the same

discipline area.

206

9.5.5 Ecological Validity

For research to be considered ecologically valid it requires the research methods,

materials and setting to reflect the real world. In the case of this research the

requirements were met as the investigative activities all occurred in the classroom as

part of the daily activities. The results for the formative assessment exercises

contributed to the student revision and directed them along their learning path. The

application of summative assessment with confidence measurement contributed to the

student’s final grade as part of the assessment suite being offered during the semester.

The initial pilot programs compromised the ecological validity slightly as it used an

inferior prototype, but in essence provided feedback close to the final version. Similarly

the small simulation exercise removed itself from the routine delivery for a select few,

but still emulated the assessment environment in which the implementation would

occur. Importantly for the vast majority of the time the participants were mindful of the

losses and gains to be obtained and the associated risks, placing them firmly into a real

world scenario.

9.6 Research Contribution

9.6.1 Outcome 1: The MCQCM Tool

The development of the MCQCM Web-based tool is a major outcome of this research.

In its construction it has incorporated design aspects that are both functional and

beneficial to both the students and instructors. The adherence to Sim, Read and

Holifield (2008) heuristics for CAA interactive systems ensures that the student’s

experience and interaction with it is beneficial to their learning by producing

meaningful feedback. The incorporation of the balanced-scoring method that is linearly

proportional to the registered confidence places the control into the hands of the

student, where the scoring is a direct consequence of their direct manipulation of the

system. MCQCM’s ability to handle large amounts of information on restricted viewing

space by incorporating aspects of bifocal display with split screen enables the viewing

of graphics or scripts while not losing sight of the question to the student is to consider,

207

minimises the cognitive load of the student. The 24 hours/7 days a week availability of

the MCQCM via the Internet for self-assessment greatly increases its appeal to the

students as a means to assess their level of knowledge at their convenience.

9.6.1.1 Why was this research necessary?

Good assessment is required to meet a set of criteria to be considered beneficial to the

student and instructor. Present assessment strategies can fall short of these requirements

leaving both the student and instructor uninformed. The MCQCM has been designed to

address some of these shortcomings by providing a Web-based solution for formative

and summative assessment that supplies critical assessment of the student’s knowledge

while still being considered as fair. Its adherence to the games taxonomy and criteria

for good game play ensures that it provides a challenging experience in an environment

of balanced difficulty, encouraging risk-taking, rewarding for absolute and partial

knowledge while highlighting areas of incorrect understanding. Tools like the MCQCM

need to be developed and nurtured to ensure that education reaps the benefits of this

progressive technological environment.

9.6.1.2 Who benefits from this research and how do they benefit?

The benefactors from the application of the MCQCM are the students, instructors,

educational institutions and researchers of CAA that adopt assessment with confidence

into the curriculum. The student benefits from the advantages of having a method for

self-assessment readily available to them in order to ascertain their understanding and

modify their learning path appropriately. Additionally they can achieve higher results

by being rewarded for incomplete or partial knowledge in areas where they would

normally not with traditional assessment. The MCQCM permits the instructor to gauge

the level of understanding of particular components of the curriculum being assessed

via the graphical displays and graphs produced by the system, encouraging adjustment

of the delivery schedule to compensate for miscomprehension, misconception or

delusion. The educational institutions can benefit by implementing the MCQCM to

208

address the shortcomings of traditional assessment, offering a more rewarding

experience to their students.

Researchers can benefit from the MCQCM by leveraging off the development

processes and subsequent studies to develop and refine other technologies to improve

assessment within their individual contexts.

9.6.2 Outcome 2: The Value of Assessment with Confidence Measurement for

Formative Assessment

This research contains evaluation and analysis of the use of assessment with confidence

measurement for formative assessment from both the instructor’s and the student’s

perspective. It is apparent that in its particular field of application the acceptance of an

assessment with confidence measurement incorporating a penalisation and reward

scheme offered enriched feedback. This feedback influences the learning path of the

individual as a means of addressing the areas of concern, often leading to further

revision and additional reflection. The instructors also modify their teaching schedule

in order to readdress the content areas identified as in need of revisiting, especially

areas where strong misunderstanding occurs. The opportunity for the student to gauge

their present level of understanding at various times during the semester often

contributes to increased learning, greatly improving their final results. This research has

also cited similar successful applications of assessment with confidence measurement

in other fields demonstrating the value of the approach in determining the true level of

knowledge while discouraging guessing.


In this research the shortcomings of assessment have been discussed at length citing

various fellow researchers concerns of encouraging guessing, possible confirmation

about misunderstanding of content and inability to recognise, cultivate and nurture

partial knowledge. Additionally, the opportunity to promote the healthy declaration of

“having no knowledge” in a designated area should be of benefit to both the instructor

and the student. These practices should be encouraged, as Gardner-Medwin in a recent

209

discussion considers it to be irresponsible behaviour to merely let a student through

without penalty when demonstrating high level of confidence for incorrect knowledge.

The meta-question from Diamond and Forester (1983) of how sure the student is of

their answer to the question is about what they know, encourages deeper understanding

from the student, probing their inner thoughts to ascertain their real level of knowledge.

Educators acknowledge the limitations of assessment and many actively seek

alternatives. This is demonstrated by their willingness to offer a number of assessment

activities designed to permit the student to demonstrate their knowledge in various

ways, such as written reports, case studies, small tests, exams and other equally

acceptable assessment methods.


The contribution of this research has benefits to both students and the instructors. In

light of the discussion above, this research demonstrates the advantages of providing

self-assessment with confidence measurement activities during the subject delivery

program. Early detection of miscomprehensions creates the opportunity to readdress

issues before it is too late. As demonstrated by Hede (2002) the active process of

rehearsal facilitates the passing of knowledge from the short-term to long-term

memory, which can be assisted by the use of formative assessment. The redirection of

learning for a student is vital to their success in their studies and deserves particular

attention. Assessment with confidence measurement offers precision in timely

identifying miscomprehensions. As previously stated the instructor benefits from being

alerted to areas of poor understanding, adjusting their instructional program

accordingly. Finally, they have the opportunity to narrow in on sub levels of the content

area in which the layers of understanding can be reinforced. This is of particular value

to vocational training where levels of competency are required and need to be

continually evaluated and addressed.

210

9.6.3 Outcome 3: The Value of Assessment with Confidence Measurement for

Summative Assessment.

Similar to formative assessment, this research identifies assessment with confidence

measurement for summative assessment as a solution to the problem identified. The

recorded appreciation of both the students and the instructors with its supporting

statistical evidence positions it as a viable assessment strategy. In some cases the

advent of new technology has pushed the assessment methods beyond their original

intended use, as in the case of MCQ’s, where the ease of facilitating their construction

and implementation places them high in the order of choice.


As summative assessment plays a major role in the educational process, it is important

that the instructors use assessment techniques that offer the best solutions. The

acknowledgment of the failings of traditional assessment permits the inclusion of

alternative assessment enabling a greater precision in producing a summative grade.

The practice of students guessing creates artificial results, a noise in the data (De Carlo

2005), which falsely distributes the grades and the integrity of the results. The

opportunity to minimise the effect of this negative influential data is appealing to both

educators and students, especially the better students who consider guessing by the

lower achieving students to erode their high levels of achievement. Traditional MCQ

assessment can only classify a question as correct or incorrect, black or white, with no

areas of shade to recognise various levels of student knowledge. The ability for

assessment with confidence measurement to demonstrate degrees of knowledge is a

contributor to summative assessment, producing a new tier in the scores identifying

partial knowledge.


The benefits from this research into using assessment with confidence measurement for

summative assessment to both student and instructors is significant, as the outcomes

211

offer enriched feedback and a greater distribution of grades predicting the student’s

level of knowledge.

Summative assessment that takes place before the end of a delivery program, early to

mid-semester, has itself formative assessment attributes, as the feedback can be acted

upon by the student guiding them along their learning path, and the instructors in

designing their delivery schedule. In this situation the discussion in 9.6.2 pertaining to

the benefits of this research to formative assessment apply.

When summative assessment occurs at the end of the delivery program, as the exam or

the final test, the benefits to students and instructors have a different emphasis, as the

formative feedback has limited influence on the learning path for the student and

delayed action for delivery schedule amendments by the instructor. When assessment

with confidence measurement is included in the final summative assessment the student

still receives the enriched feedback, highlighting the areas of greatest concern being

where a comfortable level of knowledge was demonstrated and where the acquired

knowledge is high. Those students who either need to repeat the subject or continue

with further studies in the area can address their concerns and build on their strengths in

preparation for the following semester.

The instructor usually analyses the results of the final assessment to ascertain elements

of the delivery program that require a rethink and adjust most of the syllabus

accordingly. Contained within this research is the suggestion that the increased amount

of generated scores from assessment with confidence measurement permits the

instructors to offer a greater distribution of final results. This could influence the

allocation of the final grades, offering the ability to increase the distinction between the

students, contributing to a more discerning set of results.

One of the greatest benefits identified by this research to be gained from using

confidence measurement summative assessment is the byproduct of scheduling the

activity during the semester. As in most cases the operation of this novel assessment

tool is foreign to the students, requiring them to familiarise themselves with the method

of assessment in preparation for the summative assessment task. This forced exposure

to the tool for formative assessment increase their self-assessment regime, providing

feedback to them whether they want it or not. It is generally accepted that the better

212

students use any formative assessment opportunities made available, assisting in

securing their high grades. The lower achievers tend to stay away form them, partly due

to their general attitude towards learning and sometimes due to their fear of what their

score will be. The balanced scoring calculated from the confidence registered can

support these students by permitting them to concede where they have no knowledge

without imposing too high a penalty and maximizing their score where they do have

knowledge. While this research did not gauge the outcomes of this it did observe that

the students appreciated the opportunity to safely admit having little or no knowledge

in some areas without incurring harsh penalties in the pilot programs of Chapter 5 and

the implementations outlined in Chapter 8.

9.6.4 Outcome 4: Heuristics for CAA with Confidence Measurment

In order to ensure a reliable CAA interactive system it is necessary to undertake

evaluation, in this case, as part of the HCI iterative process. Usual practice for

interactive Web-based system is to evaluate against a set of developed heuristics

customised to the area of application. Sim, Read and Holifield (2006) developed their

recommended heuristics for CAA over a period of time by creating a corpus of

usability problems, recorded by CAA activities and testing. The heuristics developed

by Sims et al. have been used to synthesise the set of CAA specific heuristics used for

this research. During this process this research recognised the need to further extend

these heuristics to apply to CAA applications that incorporate confidence measurement,

as the testing and observations highlighted areas of concern that are unique to MCQ

with confidence measurement assessment with the use of technology. These extended

heuristics have been itemised in Chapter 6 and are available for future reference and

application as required.


Interactive assessment tools with confidence measurement are required to meet an

additional set of criteria to be considered of good quality and beneficial to all users. It is

the adherence to these extended criteria that present the greatest challenges to their

213

developers. In particular the adoption of a scoring method that links in proportionally to

the direct manipulation of the interface is critical to the perception of being fair and in

the control of the learner. This research recommends the balanced scoring technique

where the possible gain is equal to the loss when waging on answers to be correct or

incorrect, and the method by which the student registers their confidence is

proportionally scaled on the interface, as the moving of the sliding bar for confidence is

in direct proportion to the resulting score. The ability to navigate freely from question

to question in a non-linear way is a requirement of all CAA applications but holds

special significance when confidence is applied, as the building and lowering of

confidence as the test progresses often leads to the need to readdress questions in a

different light before submitting. It is these unique experiences that require

consideration to a set of customised CAA with confidence measurement heuristics.


Primarily it is the developers of CAA with confidence measurement tools that benefit

from these extended customised heuristics. Heuristic evaluation is a powerful tool with

the ability to identify most of the inadequacies of an interactive system. The success of

the activity is dependent on the quality of the heuristics and the level of expertise of the

evaluators. The extended CAA heuristics contained in this research offer the

opportunity for the developers to effectively evaluate their interactive system. In

addition to the developers, the student and instructors also benefit from these extended

heuristics as the outcome of evaluations with customised heuristics often leads to

higher levels of usability and functional refinement, improving the user experience with

the system.

9.6.5 Outcome 5: The Contribution of this Research to Educators Investigating

Alternative Assessment Strategies.

In Section 9.5.3 reference is made to the external validity of this research, where the

ability to generalise and transfer the findings of this research into similar situations is

discussed. This section shall now elaborate on the benefits this research affords to

educators who are presently, or intend to pursue alternative assessment options. It

214

firstly addresses the transferability of the findings to innovative assessment in general,

where conclusions of this research pertaining to the design and functionality of an

interactive assessment system could contribute and assist with development of a newly

proposed assessment tool. It then informs researchers considering the implementation

of existing assessment with confidence measurement, such as Paul’s (1994) CBAA or

MCQCM, the benefits to be gained by its adoption and the challenges to overcome.


Innovative assessment strategies require nurturing and supportive environments for

them to have a chance of being successful. Instructors who instigate them as part of the

assessment regime often find themselves in unchartered waters, as their educational

training often does not introduce them to the more novel approaches of assessment,

preferring to expose them to traditional practices. When faced with the challenge of

incorporating groundbreaking assessment into the curriculum, instructors often rely on

the experiences of others, as documented in this research, to assist them in establishing

an effective strategy to approach the challenge.


In the general case of the introduction of non-specific innovative assessment strategies,

(not necessarily using confidence measurement but not excluding it), there are many

components of this research that can be extrapolated to different areas of application. In

particular, the recognised need for preparatory exposure to the students in non-

threatening environments in order for them to familiarise themselves with the systems,

as recommended by Sim, Read and Holifield (2008) as part of their computer aided

assessment heuristics and Desurvire, Caplan and Toth’s (2004) game play heuristics,

would greatly enhance the chance of success when introducing new Web-based

assessment tools. Furthermore, application of the HCI guidelines for sound usability of

interactive systems, as outlined in Chapter 6, is highly recommended for similar

situations, addressing the areas of good navigation, error prevention, visualisation,

system visibility and others. The appropriate choice of a metaphor offering high

affordance is critical to the success of interactive Web-based assessment, in the case of

this research the strong association to the gaming paradigm, betting/gambling, where

215

the gain/loss is proportional to the risk taken. Accordingly, consideration to game play

theory is required, in particular the need for perceived game fairness, all of which is

highly externally valid and transferable to other areas of application.

The discussion and comparison of the various scoring options available is transferable

to applications in the field, as the gathering of the findings and recommendations from

the pioneers can greatly assist newcomers who would like to investigate like practices

in their preferred areas of application. The general acceptance of assessment with

confidence measurement will permit the instructor to adopt various assessment scoring

regimes, where customization of the most appropriate scoring method can be adopted

for the particular cohort of students.

Educators in future debating of scoring techniques will be able to use the comparisons

of the works of Gardner-Medwin and Gahan (2003), Gardner-Medwin (2006), Paul

(1994), Davies (2005) and the MCQCM scoring as a basis for discussion.

9.7 Future Work

This research has demonstrated the benefits of employing assessment with confidence

measurement, presenting the work of others in their attempt to eliminate guessing and

promote the declaration of partial or no knowledge through the development of scoring

strategies offering rewards for correct answers and penalty scores as deterrents. All of

the scoring options discussed are acceptable scoring techniques that could be used. This

research has recognised the advantages of each of the previously developed scoring

techniques including the balanced scoring adopted by the MCQCM, acknowledging

that the decisional process in choosing an appropriate scoring for implementation is

dependent on the background of the students where it will be implemented.

In light of the previous discussions it is envisaged that the design of future applications

of this research’s assessment with confidence tool (MCQCM) should incorporate a

scoring option selection mechanism, providing a number of scoring regimes available

to the user. This would permit the instructor to determine the most appropriate scoring

method for the cohort of students. In this approach the choice of scoring can be set for

the duration of the delivery or change mid-semester depending on the situation. It could

216

be feasible that the scoring penalty increases as the students progress through the

curriculum in an attempt to raise them to a higher level of knowledge. This approach

permits the application of a more “forgiving” regime, such as Paul’s (1994) CBAA if

required. As an example there might be a need to increase the students’ confidence in

certain difficult content areas by offering more lenient scoring for incorrect answers.

Alternatively there might be the need to apply severe penalties to students who require

a more honest appraisal of their progress, which Paul (1994) also provides for that

purpose but rarely implements.

Future studies may include recognition and determination of appropriate scoring

systems to match specific circumstances.

The scope of this research confined it to the Information Technology area. It would be

beneficial for further studies to investigate its application in different disciplines, as the

perception of assessment strategies by both the students and the instructors can vary

greatly depending on their social and cultural values that often differ depending on the

area of study.

In addition future research in this area would benefit from application with cohorts of

students without gender bias or more specifically that are not dominated by males as

was the case in this study. The IT educational environment attracts more males than

females, which makes gender comparative analysis difficult to perform. This research

could draw no conclusions in the area of gender bias but envisages that future activities

would be designed in areas of application where the student cohorts were comprised of

groups with gender balance.

Finally this research would benefit from exposure to cohorts of students from different

cultures, preferably in their country of origin. This would generate rich data for analysis

to ascertain if the acceptance and successful operation of assessment with confidence

measurement is dependent on the cultural environment. It is proposed that the MCQCM

be transferred to an international venue for further trials for this purpose.

217

9.8 Concluding Remarks

The outcomes of this research have contributed to addressing the concerns of traditional

assessment strategies that fail to honestly appraise a student’s level of knowledge and

encourage guessing. In order for an assessment to be of value it is required to conform

to a set of good assessment criteria, which is often not the case. This can be partially

attributed to the advent of new technology pushing the use of MCQs beyond their

intended use as an evaluation of a broad area of knowledge usually at Bloom’s lower

levels of knowledge. The ease by which they can be constructed and implemented

increases their appeal. Accordingly educators have embraced their application with

great enthusiasm often extending their implementation to higher Bloom’s order

assessment activities. This requires extensive knowledge in the construction of the

questions, which is a challenge to the most experienced educators.

The imposing of penalties for incorrect answers is another controversial subject within

CAA that will always bring forth heated debate, as educators hold varying opinions on

its ethical stand. This research acknowledges the challenges faced in determining the

most appropriate scoring technique to best suit the needs of the students. Consideration

must be given to the general intellectual position of the students, the purpose of the

assessment and the importance of the assessment task when deciding which scoring

technique to use.

This research encourages instructors interested in adopting assessment with confidence

to consider two closely related issues, with the second completely dependent on the

acceptance of the first. Initially, the instructor must decide if the benefit of assessment

with confidence measurement is worth pursuing within the individual context. This is

the pivotal question, as answering it in the affirmative commits them to the task of

successfully implementing it as either formative, in preparation for the oncoming

summative assessments, or both formative and summative, as using it for summative

assessment will necessitate formative assessment activities to ensure its success.

Inclusive to this first question is whether the instructor is prepared to impose penalties

in their scoring, as this is the mechanism by which the confidence can be incorporated

into the assessment. Instructors may find this the most challenging, as they will be

218

required to defend their choice to both students and in some cases other educators.

Once committed to assessment with confidence measurement the instructor will need to

consider the second question, that is, which scoring technique would best suit his/her

situation? This decision requires considering the intellectual ability of their students,

the motivation behind their studies, the discipline area in which the population lie and

any other influential factors. Application of assessment with confidence to a more

specific cohort of students with recognised common characteristics, traits and

behavioural qualities may suggest the adoption of scoring routines suited to their needs,

such as in the medical field (Gardner-Medwin, 2006), the Computer Science (Davies,

2004) discipline and in the Engineering domain (Paul, 1994). These deviations from the

balanced scoring of MCQCM are encouraged, as the situation requires. It is the

findings of this research that the choice of scoring is secondary to the adoption of

assessment with confidence.

Assessment based on the gambling metaphor leverages greatly off the games

environment, which is already a significant contributor to the entertainment of many of

our students. If developed and used correctly it could offer the chance to educate our

students while entertaining them and support the transition of knowledge from the short

term to the long-term memory. The synergies between the MCQ with confidence

measurement and games offer opportunities to leverage off games theory to improve

student involvement and learning and should be considered when developing within

this area.

Like many assessment choices MCQs offer the ability to assess broad areas of

knowledge effectively with minimal burden to the instructor, a benefit that should not

be overlooked. It is the scoring of the MCQ that comes under scrutiny in this research,

as the reticence of educators to adopt scoring methods to discourage guessing

jeopardises the value of MCQ testing to the educational process.

The value of assessment with confidence measurement when used for formative

application to the educational process has been widely discussed in this research and

offer enriched feedback and honest appraisal of the state of knowledge. To promote

direction along a fruitful learning path students and instructors should aim to take

advantage of technologies that adjust the learning path to address the highlighted gaps

219

in their knowledge, modifying the study program by the student and the delivery

program by the instructor.

In closing, this research offers encouragement and support to those who intend to

pursue assessment with confidence measurement by demonstrating the benefits to be

gained by both the students and instructors.

To improve education it is encouraged that educators should look beyond traditional

assessment practices, in particular when using technology to deal with an ever-

changing educational culture and student cohort. There is an increasing need to develop

best practices in assessment to exclude the minimization of feedback and loss of

student control in their ability to show their knowledge by adopting technologies that

are ill suited to the pedagogy of formative assessment.

Research into assessment with technology is still in its infancy and requires substantial

scrutiny to ensure that it is not dismissed as being unable to provide the necessary

assessment or even worse used without consideration of the implications of sub

standard procedures. This research offers one example of enabling technology to

improve student assessment while taking advantage of the benefits technology can

offer.

220

REFERENCES

Abdulwahed, M., Nagy, Z., & Blanchard, R. (2008). Beyond The Engineering

Pedagogy, Modelling Kolb’s Learning Cycle Australian Association for Environmental

Education Conference Proceedings Yeppoon.

Acker, D., & Duck, N. W. (2008). Cross-cultural Overconfidence and Biased Self-

attribution. Journal of Socio-Economics, 37(5), 1815-1824.

Ackerly, B. (2004). Critical Theory and Method in Democratic and Human Rights

Theories Annual Meeting of the International Studies Association (pp. 133). Montreal,

Canada.

Adams, E., & Rollings, A. (2007). Games Design and Development, Fundamentals of

Game Design, . Australia: Pearson Prentice Hall

Alnabhan, M. (2002). An Empirical Investigation of The Effects of Three Methods of

Handling Guessing and Risk Taking on The Psychometric Indices of a Test, Social

Behaviour and Personality. Scientific Journal 30(7), 645-652.

American Psychological Association Work Group of the Board of Educational Affairs.

(1997). Learner-Centered Psychological Principles, A Framework for School Reform

and Redesign. Washington, DC: American Psychological Association.

Amory, A. (2007). Game Object Model Version II: A Theoretical Framework for

Educational Game Development. Educational Technology Research and Development,

55(1), 51-77.

Amory, A., & Seagman, S. (2003). Education Game Models: Conceptualization and

Evaluation. South African Journal of Higher Education, 17(2), 206-217.

Ashburn, R. (1938). An Experiment in Essay-type Question. Journal of Experimental

Education, 7(1), 1-3.

Astin, A. (1991). Assessment for Excellence. Connecticut, USA Greenwood Publishing.

Ayala, C., Shavelson, R., Ruiz-Primo, M., Brandon, P., Yin, Y., Furtak, E., et al.

(2008). From Formal Embedded Assessments to Reflective Lessons: The Development

of Formative Assessment Studies,. Applied Measurement in Education, 21(4), 315-334.

221

Bacon, D. R. (2003). Assessing Learning Outcomes: A Comparison of Multiple-Choice

and Short-Answer Questions in a Marketing Context. Journal of Marketing Education,

25(1), 31-36.

Baird, D., & Fisher, M. (2006). Neomillennial User Experience Design Strategies:

Utilizing Social Networking Media to Support "Always On" Learning Styles Journal of

Educational Technology Systems 34(1), 5-32.

Bandura, A. (1983). A Self-Evaluation and Self-Efficacy Mechanisms Governing the

Motivational Effects of Goal Systems. Journal of Personality and Social Psychology

45(5), 1017-1028.

Bannon, L., Cypher, A., Greenspan, S., & Monty, M., L. (1983). Evaluation and

Analysis of Users' Activity Organization, Proceedings of the SIGCHI Conference on

Human Factors in Computing Systems. Boston, Massachusetts, United States: ACM.

Banta, T., Jones, E., & Black, K. (2009). Planning Effective Assessment. In Designing

Effective Assessment: Principles and Profiles of Good Practice in Designing Effective

Assessment. (pp. 3-10). San Francisco, CA: Wiley.

Ben-Simon, A., Budescu, D. V., & Nevo, B. (1997). A Comparative Study of Measures

of Partial Knowledge in Multiple-Choice Tests. Applied Psychological Measurement,

21(1), 65-88.

Betz, N., & Hackett, G. (1981). Manual for the Ocupational Self-Efficacy Scale.

Journal of Counseling Psychology, American Psychological Association, 28(5), 399-

410.

Bevan, N. (2009). International Standards for Usability Should Be More Widely Used.

Journal of Usability Studies, 4(3), 106-113.

Black, P., & Wiliam, D. (1998). Inside the Black Box: Raising Standards Through

Classroom Assessment. Phi Delta Kappan, 80(2), 139-148.

Black, P., & Wiliam, D. (2006). Developing a Theory of Formative Assessment. In

Assessment and Learning (pp. 81–100). London, UK: Sage.

Black, P., & Wiliam, D. (2009). Developing the Theory of Formative Assessment.

Educational Assessment, Evaluation and Accountability, 21(1), 5-31.

222

Bloom, B., & Krathwohl, D. (1956). Taxonomy of Educational Objectives: The

Classification of Educational Goals, by a committee of college and university

examiners. . New York: Longman, Green.

Bradbard, D., Parker, D., & Stone, G. (2004). An Alternative Multiple-Choice Scoring

Procedure in a Microeconomics Course Decision Sciences Journal of Innovative

Education 2(1), 11-26.

Bradshaw, H. (2007). Computer Game Playability; Learning Through Game Play

Design. Paper presented at the Learning with Games (LG) Conference Proceedings,

Sophia Antipolis, France.

Brown, T., & Shufford, E. (1973 ). Quantifying Uncertainty Into Numerical

Probabilities for the Reporting of Intelligence. Sant-Monica RAND

Bush, M. (2001). A Multiple Choice Test that Rewards Partial Knowledge. Journal of

Further and Higher Education, 25(2), 157 - 163.

Carless, D. (2007). Learning-oriented Assessment: Conceptual Bases and Practical

Implications. Innovations in Education and Teaching International, 44, 57-66.

Cassell, C., & Nadin, S. (2008). Theory and Research Methods: Interpretivists

Approaches to Entrepreneurship. In R. Barrett, S. Mayson & E. Elga (Eds.),

International Handbook of Entrepreneurship and HRM (pp. 71-88).

Chatti, M., Jarke, M., & Frosch- Wilke, D. (2007). The Future of E-learning: A Shift

To Knowledge Networking and Social Software. , International Journal of Knowledge

and Learning 3(4-5 ), 404-420.

Choi, I., Lee, S., & Jung, J. (2008). Designing Multimedia Case-Based Instruction

Accommodating Students’ Diverse Learning Styles. Journal of Educational

Multimedia and Hypermedia, 17(1), 5-25.

Clark, J., & Friesen, L. (2009). Overconfidence in Forecasts of Own Performance: An

Experimental Study. Economic Journal, 119(534), 229-251.

Clark, R., & Feldon, D. (2005). Five Common but Questionable Principles of

Multimedia Learning In R. Mayer (Ed.), The Cambridge Handbook of Multimedia

Learning (pp. 97–115). New York, USA Cambridge University Press

223

Coffield, F., Moseley, D., Hall, E., & Ecclestone, K. (2004). Learning Styles and

Pedagogy In Post-16 Learning: A Systematic and Critical Review. , from

http://www.ncl.ac.uk/ecls/research/project/1927

Cohen, L., Manion, L., & Morrison, K. (2007). Chapter 1:The Nature of Enquiry-

Setting the Field. In Research Methods in Education (pp. 5-47). London, UK:

Routledge.

Comte, A. (1868). Positive Philosophy New York William Gowens.

Corveleyn, J., & Luyten, P. (2006). Minding the Gap Between Positivism and

Hermeneutics in Psychoanalytic Research. American Psychoanalytic Association,

54(2), 571-610.

Crocker, L. (2005). Teaching for the Test: How and Why Test Preparation is

Appropriate. In R. Phelps (Ed.), Defending Standardized Testing. (pp. 159-174). New

Jersey, USA: Lawrence Erlbaun Associates.

Daniel, B., O’Brien, D., & Sarkar, A. (2009). User Centered Design Principles for

Online Learning Communities: A Sociotechnical Approach for the Design of a

Distributed Community of Practise In M. Lytras & P. Ordonez de Pablos (Eds.), Social

Web Evolution: Integrating Semantic Applications and Web 2.0 Technologies (pp. 54-

71). Hershey, USA: Information Science Publishing.

Davidoff, F. (1995). Confidence Testing - How to Answer a Meta-Question. American

College of Physicians Observer.

Davies, P. (2005). Continual Assessment of Confidence or Knowledge with Hidden

MCQ. Paper presented at the Computer Assisted Assessment Loughborough, England.

De Carlo, L., A Model of Rater Behaviour in Essay Grading Based on Signal Detection

Theory, Journal of Education Measurement, Vol. 42, Iss. 1, Wiley-Blackwell, p 53-76.

(2005). A Model of Rater Behaviour in Essay Grading Based on Signal Detection

Theory. Journal of Education Measurement,, 42(1), 53-76.

Desurvire, H., Caplan, M., & Toth, J. (2004). Using Heuristics to Evaluate the

Playability of Games. . Paper presented at the CHI Association for Computing

Machinery, New Jersey, USA.

Diamond, G., & Forrester, J. (1983). An Epistemologic Model of Clinical Judgment.

Amican Journal of Medicine, 75, 129-137.

224

Dix, A., Finlay, J., Abowd, G., & Beale, R. (2004). Human-Computer Interaction (4th

ed.). Australia: Pearson.

Doebbert, J. (1999). Benchmarking the Learning Environment (Technology). In

National Centre of Research in Vocational Education. California, USA: Copa &

Ammentorp Press.

Farrell, G., Farrell, V., & Leung, Y. (2001). Online Software Test for Efficient and

Effective Assessment Using Multiple Choice Questions- An Evaluation. Paper presented

at the American Educational Research Association Conference Seattle, USA.

Farrell, G., & Leung, Y. (2002a). Designing an Online Self-Assessment Tool Utilizing

Confidence Measurement. Paper presented at the Seeking Success in E-Business, IFIP

8.4 Working Group, Copenhagen, Denmark.

Farrell, G., & Leung, Y. (2002b). Improving the Design of an Online Self-Assessment

Tool Utilizing Confidence Measurement. Paper presented at the Web-Based Learning:

Men and Machines, Hong Kong.

Farrell, G., & Leung, Y. (2004a). Comparison of Two Student Cohorts Utilizing Black

Board CAA with Different Assessment Content: A Lesson to be Learnt. Paper presented

at the Computer Assisted Assessment Conference Loughborough, England.

Farrell, G., & Leung, Y. (2004b). Innovative Online Assessment. Education and

Information Technology Journal of the IFIP Technical Committee on Education, 9(1),

5-20.

Farrell, G., & Leung, Y. (2005). A Comparison of Blackboard CAA and an Innovative

Self Assessment Tool for Formative Assessment. Paper presented at the Computer

Assisted Assessment Conference, Loughborough, England.

Farrell, G., & Leung, Y. (2006). A Comparison of an Innovative Assessment Tool

Utilizing Confidence Measurement to the Traditional Multiple Choice, Short Answer

and Problem Solving Questions. Paper presented at the Computer Assisted Assessment

Conference Loughborough, England.

Farrell, G., & Leung, Y. (2008). Convergence of Validity for the Results of a

Summative Assessment with Confidence Measurement and Traditional Assessment.

Paper presented at the Computer Assisted Assessment Conference Loughborough,

England.

225

Feltz, D. (2007). Self Confidence and Sports Performance . In D. Smith & M. Bar-Eli

(Eds.), Essential Readings in Sport and Exercise Psychology, Human Kinetics (pp. 423-

458). Illinois, USA, .

Frandsen, G., & Schwartzbach, M. (2006). A Singular Choice for Multiple Choice.

Special Interest Group on Computer Science Education (SIGICS) Bulletin Association

for Computer Machinery, 39(4), 34-38.

Frary, R. (1985). More Multiple-choice Item Writing Do's and Don'ts, Practical

Assessment, Assessment and Evaluation. ERIC Clearinghouse on Assessment and

Evaluation., 4(11).

Fuchs, C., & Sandoval, M. (2008). Positivism, Postmodernism, or Critical Theory? A

Case Study of Communications Students’ Understandings of Criticism. Journal For

Critical Education Policy Studies, The Institute for Educational Study Policies (IEPS),

6(2), 112-141.

Fullarton, S. (1993). Confidence in Mathematics: The Effects of Gender, Research

Mongraph. Melbourne, Australia: Deakin University: National Centre for Research and

Development in Mathematics Education.

Gardner-Medwin, A. (2006). Confidence-Based Marking: Towards Deeper Learning

and Better Exams In C. Bryan & K. Clegg. (Eds.), Innovative Assessment in Higher

Education (pp. 141-149). London, England: Taylor & Francis.

Gardner-Medwin, A., & Gahan, M. (2003). Formative and Summative Confidence-

Based Assessment. Paper presented at the Computer Assisted Assessment Conference,

Loughborough, England.

Gerjetsa, P., Scheiter, K., Opfermann, M., Hesseaand, F., & Eysinkc, T. (2009).

Learning With Hypermedia: The Influence of Representational Formats and Different

Levels of Learner Control on Performance and Learning Behavior. Computers in

Human Behavior, Elsevier, 25(2), 360-370.

Gerring, J. (2003a). Interpretations of Interpretivism, Qualitative Methods. Newsletter

of the American Political Science, Association Organized Section on Qualitative

Methods: Non Refereed, 1(2), 2-6.

Gerring, J. (2003b). Qualitative Methods. Newsletter of the American political Science

association Organized Section on Qualitative Methods: Non Refereed.

226

Gerring, J. (2007). Conundrum of Case study. In J. Gerring (Ed.), Case Study Research

(pp. 1-51). New York, USA: Cambridge University Press.

Giddings, L., & Grant, B. (2007). A Trojan Horse for Positivism? A Critique of Mixed

Methods Research. Advances in Nursing Science, 30(1), 52-60.

Govaerts, M., C., V. der V., Schuwirth, L., & Muijtjens, M. (2007). Broadening

Perspectives on Clinical Performance Assessment: Rethinking the Nature of In-training

Assessment. Advances in Health Sciences Education, 12(2), 239-260.

Greenfield, P. (2009). Technology and Informal Education: What Is Taught, What Is

Learned. American Association for the Advancement of Science (AAAS), 323(5910), 69-

71.

Harris, L., Sadowski, M., & Birchman, J. (2006). A Comparison of Learning Style

Models and Assessment Instruments for University Graphics Educators. The

Engineering Design Graphics Journal, 70(1), 6-15.

Harrison, C., & Petrie, H. (2007). Deconstructing Web Experience: More Than Just

Usability and Good Design. In Lecture Notes in Computer Science: Human-Computer

Interaction HCI Applications and Services. (Vol. 4553/2007, pp. 889-898). Berlin,

Heidelberg, Germany: Springer Berlin / Heidelberg.

Hartley, K., Strudler, N., & Schraw, G. (2008). Nevada Schools Educational

Technology Needs Assessment. Nevada, USA: College of Education

University of Nevada, Las Vegas.

Hartmann, B., Abdulla, L., Mittal, M., & Klemmer, S. (2007). Authoring Sensor-based

Interactions by Demonstration with Direct Manipulation and Pattern Recognition.

Paper presented at the Human Factors in Computing Systems (SIGCHI).

Hassman, P., & Hunt, D. (1994). Human Self-Assessment in Multiple Choice Testing.

Journal of Educational Measurement, 31(2), 149-160.

Hattie, J., & Timperley, H. (2007). The Power of Feedback. Review of Educational

Research. American Educational Research Association, 77(1), 81-112.

Hede, A. (2002). An Integrated Model of Multimedia Effects on Learning. Journal of

Education, Multimedia and Hypermedia, 11(12), 177-191.

Hobson, A., & Ghoshal, D. (1996). Flexible Scoring for Multiple-Choice Exams,. The

Physics Teacher, 34(5), 284- 305.

227

Hogg, M., & Maclaran, P. (2008). Rhetorical Issues in Writing Interpretivist Consumer

Research,. Qualitative Market Research: An International Journal, 11(2), 130 – 146.

Hounsell, D., McCune, V., Hounsell, J., & Litjens, J. (2008). The Quality of Guidance

and Feedback to Students. Higher Education Research and Development, 27(1), 56-67.

Howlett, D., Vincent, T., Watson, G., Owens, E., Webb, R., Gainsborough, N., et al.

(2009). Blending Online Techniques with Traditional Face to Face Teaching Methods

to Deliver Final Year Undergraduate Radiology Learning Content,. European Journal

of Radiology, In Press.

Hussain, Z., Lechner, M., Milchrahm, H., Shahzad, S., Slany, W., Umgeher, M., et al.

(2008). Agile User-Centered Design Applied to a Mobile Multimedia Streaming

Application. In Lecture notes in Computer Science, HCI and Usability for Education

(Vol. 5298, pp. 313-330.). Springer Berlin / Heidelberg: Springer.

Hyerle, D. (2009). Visual Tools for Transforming Information Into Knowledge.

London, England: Sage.

Isacker, K., Slegers, K., Gemou, M., & Bekiaris, E. (2009). A UCD Approach Towards

the Design, Development and Assessment of Accessible Applications in a Large Scale

European Integrated Project. Paper presented at the Universal Access to Human-

Computer Interaction, San Diego, USA.

Jennings, S., & Bush, M. (2006). A Comparison of Conventional and Liberal (Free

Choice) Tests, in Practical Assessment. Research and Evaluation Online Journal,,

11(8), 1-5.

Johnson, P., Buehring, A., Cassell, C., & Symon, G. (2006). Evaluating Qualitative

Management Research: Towards a Contingent Criteriology. International Journal of

Management Reviews, 8(3), 131-156.

Kalyuga, S., Chandler, P., & Sweller’s, J. (1998). Levels of Expertise and Instructional

Design. Human Factors, 40(1), 1-17.

Karpicke, J., Butler, A., & Roediger III, H. (2009). Metacognitive Strategies in Student

Learning: Do Students Practise Retrieval When They Study on Their Own? Memory

Journal, 17(4), 471-479.

Kaufman, D. (2003). Applying Educational Theory In Practice. Quality & Safety in

Health Care, British Medical Journal (BMJ), 326(7382), 213-216.

228

Kehoe, J. (1995). Writing Multiple Choice Test Questions in Practical Assessment and

Evaluation,. ERIC Clearinghouse on Assessment and Evaluation, 4(9).

Keller, J. (2008). First Principles of Motivation to Learn and E3-learning. Distance

Education, 29(2), 175-185.

Kennedy, K., Chan, J., Fok, P., & Yu, W. (2008). Forms of Assessment and Their

Potential For Enhancing Learning: Conceptual and Cultural Issues. Educational

Research For Policy and Practice, 7(3), 197-207.

Klinger, A. (1997). Experimental Validation of Learning Accomplishment. Paper

presented at the Frontiers in Education Los Angeles, USA.

Kolb, D. (1984). Experiential Learning: Experience As The Source of Learning and

Development. Jersey, USA: Prentice Hall.

Kolb, D. (1999). The Kolb Learning Style Inventory, Version 3. Boston,USA: Hay

Group.

Komarraju, M., Karau, S., & Schmeck, R. (2009). Role of the Big Five Personality

Traits in Predicting College Students' Academic Motivation and Achievement.

Learning and Individual Differences, 19(1), 47-52.

Krätzig, G., & Arbuthnott, K. (2009). Metacognitive Learning: The Effect of Item-

Specific Experience and Age on Metamemory Calibration and Planning. Metacognition

and Learning,, 4(2), 125-144.

Krumboltz, J., & Christine, J. (1999). Point of View, Competitive Grading Sabotages

Good Teaching: Professional Education in Education, Phi Delta Kappan.

Landsberger, H. (1958). Hawthorne Revisited. New York, USA: Cornell University

Press.

Lederman, N., & Niess, M. (2000). Technology's Sake or for the Improvement of

Teaching and Learning? . School Science and Mathematics Journal, 100(7), 345-348.

Leung, Y. (1995). Applying Bifocal Displays to Data Visualisation. Swinburne

University of Technology, Melbourne, Australia.

Libarkin, J. (2008). Concept Inventories in Higher Education Science. Paper presented

at the Board Of Science Education Conference, Washington, DC.

Lincoln, Y. S., & Guba, E. G. (1985). Naturalistic Inquiry: Sage Publications, Inc.

229

Lindström, H., & Malmsten, M. (2008). User-centred Design and Agile Development:

Rebuilding the Swedish National Union Catalogue. The Code4Lib Journal, 5.

Longino, H. E. (2002). The Fate of Knowledge: Princeton Univ Pr.

Mansell, R. (2009). A Critique of the Mainstream Vision and an AlternativeResearch

Framework. The Information Society and ICT Policy, Journal of Information,

Communication & Ethics in Society, 8(1), 22-44.

Marcus, A. (1984). Graphic Design for Computer Graphics. Computers in Industry,

5(1), 51-63.

Marshall, J. B., & Carson, C. M. (2008). A Preliminary Bloom’s Taxonomy

Assessment Of End-Of-Chapter Problems In Business School Textbooks. American

Journal of Business Education--Fourth Quarter, 1(2).

Marshall University. (1999). Comparison of Online Course Delivery Software

Products, from http://multimedia.marshall.edu/cit/webct/compare/comparison.html

Martin, D. J. (2008). Elementary Science Methods: A Constructivist Approach (5th

ed.): Wadsworth Pub Co.

Marx, K. (1884). Economic and Philosophical Manuscripts of 1844. In Early Writings

(pp. 279-400). Berlin: Dietz.

Mayhew, D. J. (1999). The Usability Engineering Lifecycle: A Practitioner's Handbook

for User Interface Design: Morgan Kaufmann.

McCormick, B. H. (1988). Visualization in Scientific Computing. ACM SIGBIO

Newsletter, 10(1), 21.

McCoubrie, P. (2004). Improving the Fairness of Multiple-choice Questions: A

Literature Review. Medical Teacher, 26(8), 709-712.

McIlveen, P. (2007). The Genuine Scientist-practitioner in Vocational Psychology: An

Autoethnography. Qualitative Research in Psychology, 4(4), 295-311.

McNeil, B. J., & Nelson, K. R. (1990). Meta-Analysis of Interactive Video Instruction:

A 10 Year Review of Achievement Effects.

Moos, D. C., & Azevedo, R. (2009). Learning with Computer-based Learning

Environments: A Literature Review of Computer Self-efficacy. Review of Educational

Research, 79(2), 576.

230

Morris, M., Porter, A., & Griffiths, D. (2004). Assessment is Bloomin' Luverly:

Developing Assessment that Enhances Learning. Journal of University Teaching and

Learning Practice, 1(2), 90-106.

Najjar, L. J. (1996). Multimedia Information and Learning. Journal of Educational

Multimedia and Hypermedia, 5(2), 129-150.

National Council of Teachers of Mathematics. (2000). Principles and Standards for

School Mathematics Retrieved 11/5/2010, from http://standards.nctm,org

Ng, A. W. Y., & Chan, A. H. S. (2009). Different Methods of Multiple-Choice Test:

Implications and Design for Further Research. Proceedings of the International

MultiConference of Engineers and Computer Scientists, 2.

Nicol, D. J., & Macfarlane-Dick, D. (2006). Formative Assessment and Self-regulated

Learning: A Model and Seven Principles of Good Feedback Practice. Studies in Higher

Education, 31(2), 199-218.

Nielsen, J. (1994a). Enhancing the Explanatory Power of Usability Heuristics. Paper

presented at the Proceedings of the SIGCHI Conference on Human Factors in

Computing Systems: Celebrating Interdependence.

Nielsen, J. (1994b). Usability Inspection Methods. Paper presented at the Conference

Companion on Human Factors in Computing Systems.

Nielsen, J., & Molich, R. (1990). Heuristic Evaluation of User Interfaces. Paper

presented at the Proceedings of the SIGCHI Conference on Human Factors in

Computing Systems: Empowering People.

Nieweg, M. R. (2000). Learning to Reflect a Practical Theory of Teaching, Amsterdam

University of Professional Education, Netherlands.

Novak, J. D., & Cañas, A. J. (2008). The theory Underlying Concept Maps and How to

Construct and Use Them. Florida Institute for Human and Machine Cognition

Pensacola Fl, .

Paddison, C., & Englefield, P. (2004). Applying Heuristics to Accessibility Inspections.

Interacting with Computers, 16(3), 507-521.

Palmer, E. J., & Devitt, P. G. (2007). Assessment of Higher Order Cognitive Skills in

Undergraduate Education: Modified Essay or Multiple Choice Questions? Research

Paper. BMC Medical Education, 7(1), 49.

231

Paul, J. (1994). Improving Education Through Computer-based Alternative Assessment

Methods. People and Computers, 81-81.

Perrenoud, P. (1998). From Formative Evaluation to a Controlled Regulation of

Learning Processes. Towards a Wider Conceptual Field. Assessment in Education:

Principles, Policy & Practice, 5(1), 85-102.

Piaget, J., & Duckworth, E. (1970). Genetic Epistemology. American Behavioral

Scientist, 13(3), 459.

Pollard, G. (1985). Scoring in Multiple-choice Examinations. Math. Scientist, 10, 93-

97.

Pollard, G. (1986). Scoring to Remove Guessing in Multiple Choice Exams. Math.

Education Science Technology, 20(2), 33-36.

Pollard, G. (1993). Further Scoring Systems to Remove Guessing in Multiple Choice

Examinations. Mathematics Competitions, 2(1), 27-43.

Pollard, G., & Clark, D. (1989). An Optimal Scoring System of Multiple Choice

Competitions and an Analysis of Candidates Responses Under Two Different Methods

of Scoring. Mathematics Competitions, 2(2), 33-36.

Popham, W. J. (2008). Transformative Assessment: ASCD.

Prensky, M. (2003). Digital Game-based Learning. Computers in Entertainment (CIE),

1(1), 21.

Quinn, C. N. (2005). Engaging Learning: Designing E-learning Simulation Games:

Pfeiffer & Co.

Reed, I. (2008). Review Essay: Social Theory, Post-Post-Positivism and the Question

of Interpretation. International Sociology, 23(5), 665.

Rice, M., Campbell, C., & Mousley, J. (2007). Using Online Environments to Promote

Assessment as a Learning Enhancement Process. Enhancing Teaching and Learning

Through Assessment: Deriving an Appropriate Model, 418.

Rieber, L. P. (1996). Seriously Considering Play: Designing Interactive Learning

Environments Based on the Blending of Microworlds, Simulations, and Games.

Educational Technology Research and Development, 44(2), 43-58.

Righi, C., & James, J. (2007). User-centered Design Stories: Real-world UCD Case

Studies. Interactive Technologies, 560.

232

Rodriguez, M. C. (2005). Three Options are Optimal for Multiple-choice Items: A

Meta-analysis of 80 Years of Research. Educational Measurement: Issues and

Practice, 24(2), 3-13.

Salen, K., & Zimmerman, E. (2003). Rules of Play: Game Design Fundamentals: MIT

Press.

Schneiderman, B. (1997). Designing the User Interface (3rd ed.). Boston: Addison-

Wesley.

Schuwirth, L. W. T., & Van Der Vleuten, C. P. M. (2006). Challenges for

educationalists. British Medical Journal, 333(7567), 544.

Seffah, A., & Metzker, E. (2008). Adoption-centric Usability Engineering: Systematic

Deployment, Assessment and Improvement of Usability Methods in Software

Engineering: Springer-Verlag New York Inc.

Seufert, T., Schütze, M., & Brünken, R. (2009). Memory Characteristics and Modality

in Multimedia Learning: An Aptitude-treatment-interaction Study. Learning and

Instruction, 19(1), 28-42.

Sharp, H., Rogers, Y., & Preece, J. (2007). Interaction Design: Beyond Human-

computer Interaction: Wiley Hoboken, NJ.

Shavelson, R. J., Young, D. B., Ayala, C. C., Brandon, P. R., Furtak, E. M., & Ruiz-

Primo, M. A. (2008). On the Impact of Curriculum-embedded Formative Assessment

on Cearning: A Collaboration Between Curriculum and Assessment Developers.

Applied Measurement in Education, 21(4), 295-314.

Shneiderman, B. (1982). The Future of Interactive Systems and the Emergence of

Direct Manipulation. Behaviour & Information Technology, 1(3), 237-256.

Shneiderman, B., & Plaisant, C. (2005). Designing The User Interface (4th ed.).

Boston: Addison-Wesley/Pearson

Shoben Jr, E. J. (2009). Psychotherapy as a Problem in Learning Theory. Journal of

Psychotherapy Integration, 19(2), 111-139.

Sim, G., Horton, M., & Strong, S. (2004). Interfaces For Online Assessment: Friend or

Foe? Paper presented at the 7th HCI Educators Workshop Conference Effective

Teaching and Training in HCI. British Human-Computer-Interaction Group Preston.

233

Sim, G., Read, J., & Cockton, G. (2009). Evidence Based Design of Heuristics for

Computer Assisted Assessment. Human-Computer Interaction--INTERACT 2009, 204-

216.

Sim, G., Read, J. C., & Holifield, P. (2006). Using Heuristics to Evaluate a Computer

Assisted Assessment Environment.

Sim, G., Read, J. C., & Holifield, P. (2008). Heuristics for Evaluating the Usability of

CAA Applications.

Spence, R., & Apperley, M. (1982). Database Navigation: an Office Environment for

the Professional. Behaviour & Information Technology, 1(1), 43-54.

Starr, C. W., Manaris, B., & Stalvey, R. A. H. (2008). Bloom's Taxonomy Revisited:

Specifying Assessable Learning Objectives in Computer Science. ACM SIGCSE

Bulletin, 40(1), 261-265.

Stavropoulos, N. (2007). Interpretivist Theories in Law. Law: Metaphysics Meaning

and Objectivity.

Steinmetz, G. (2007). Fordism and the Positivist Revenant: Response to Burris, Riley,

and Fourcade. Social Science History, 31(1), 127.

Sternberg, R. J. (1988). The Nature of Creativity: Contemporary Psychological

Perspectives: Cambridge Univ Pr.

Swartz, S. M. (2006). Acceptance and Accuracy of Multiple Choice, Confidence-level,

and Essay Question Formats for Graduate Students. The Journal of Education for

Business, 81(4), 215-220.

Taras, M. (2009). Summative Assessment: The Missing Link for Formative

Assessment. Journal of Further and Higher Education, 33(1), 57-69.

Tarrant, M., Ware, J., & Mohammed, A. M. (2009). An Assessment of Functioning and

Non-functioning Distractors in Multiple-choice Questions: A Descriptive Analysis.

BMC Medical Education, 9(1), 40.

Taylor, J., Sumner, T., & Law†, A. (1997). Talking About Multimedia: A Layered

Design Framework. Learning, Media and Technology, 23(2), 215-241.

Te'eni, D., Carey, J. M., & Zhang, P. (2007). Human Computer Interaction: Developing

Effective Organizational Information Systems: Wiley.

234

Tomanek, D., Talanquer, V., & Novodvorsky, I. (2008). What do Science Teachers

Consider When Selecting Formative Assessment Tasks? Journal of Research in

Science Teaching, 45(10), 1113-1130.

Torrance, H. (2007a). Assessment as Learning? How the Use of Explicit Learning

Objectives, Assessment Criteria and Feedback in Post-secondary Education and

Training can Come to Dominate Learning. Assessment in Education: Principles, Policy

& Practice, 14(3), 281-294.

Torrance, H. (2007b). Assessment in Post-secondary Education and Training: Editorial

ntroduction. Assessment in Education: Principles, Policy and Practice, 14(3), 277-279.

Torrance, H., & Coultas, J. (2009). Do Summative Assessment and Testing Have a

Positive or Negative Effect on Post-16 Learners' Motivation for Learning in the

Learning and Skills Sector? : National Centre for Vocational Education Research

(NCVER).

Trotter, E. (2006). Student Perceptions of Continuous Summative Assessment.

Assessment & Evaluation in Higher Education, 31(5), 505-521.

Ventouras, E., Triantis, D., Tsiakas, P., & Stergiopoulos, C. (2010). Comparison of

Examination Methods Based on Multiple-choice Questions and Constructed-Response

Questions Using Personal Computers. Computers & Education, 54(2), 455-461.

Wilson, R. B., & Case, S. M. (1993). Extended Matching Questions: An Alternative to

Multiple-choice or Free-response Questions. Journal of Veterinary Medical Education,

20(3).

Zimmerman, B. (2008). Investigating Self-regulation and Motivation: Historical

Background, Methodolgical Developments, and Future Prospects. American

Educational Research Journal, 45(1), 166-183.

235

APPENDIX A: SURVEYS

Appendix A1:Paper Based Surveys

____________________________________________________________

Self Assessment Test Trial:

Developed and Administered by Graham Farrell:

Topic: C++ Fundamentals

It is important that you know that you can cease to participate in this project at

any time without reason or justification. If you choose to withdraw please let the

supervisor know and hand back the Questionnaire.

Subject Number: ____________________

Age: 18-15 26-30 31-40 41+

Sex: M F

Computer Experience: None Casual Proficient

1. Was the system easy to operate?

|_______________________________|___________________________________|

No Helpful Extremely

Help Helpful

2. Did the feedback display produce comprehensible information in order to be

valuable in directing the student along their learning path?

|__________________________________|___________________________________|

No Helpful Extremely

236

Help Helpful

3. Is a scoring system that penalised for incorrect choices and rewarded for correct

choices in a linear proportionality easy to comprehend?

|______________________________________|_______________________________|

Not Easy Extremely

Easy Easy

4. Would the participant A) actively use the sliding bar to register their level of

confidence freely and B) would they perceive the system as being either too

complicated or too threatening?

A)

|__________________________________|________________________________|

Not use sliding Bar Freely Unsure Use Sliding Bar Freely

B)

|______________________________|________________________________|

Not complicated Complicated

or Threatening Unsure or Threatening

5. Would a self-testing program of this design favour a particular learning style?

|______________________________________________________________|

Not Preferred Extremely

Prefered Preferable

237

6. Would students consider the proposed system might be more favourable to the

extraverted individual and disadvantage the introverted user?

|________________________________|_____________________________________|

Advantage Neither No Advantage

7. Would students consider the proposed system to be gender bias?

|__________________________________|__________________________________|

Biased Unknown Not Biased

238

239

240

241

242

243

244

245

246

Appendix A2: Online Surveys

247

248

249

APPENDIX B: SIMULATION RESULT DISPLAYS

Simulation of a student’s first attempt to use the MCQCM

250

Simulation of a student’s second attempt to use the MCQCM

251

APPENDIX C: MCQCM SCREEN PRESENTATIONS

Documents

Improvement of Learning Through Interactive Confidence ... · an innovative assessment tool based on the traditional Multiple-choice Question (MCQ) format that incorporates a method