Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Improvement of Learning Through
Interactive Confidence-based Assessment
by
Graham Farrell
BAppSci RMIT
Grad Dip Ed Hawthorn Institute
MIT SUT
A Thesis Submitted to
the Faculty of
Information and Communication Technologies
Swinburne University of Technology
for the degree of
Doctor of Philosophy
September 2010
ii
DECLARATION
This thesis contains no material which has been accepted for the award of any other degree or
diploma except where due reference is made in the text of the thesis. To the best of my
knowledge, this thesis contains no material previously published or written by another person
except where due reference is made in the text of the thesis.
Graham Farrell
September 2010
iii
ACKNOWLEDGEMENTS
I would like to offer my sincere appreciation to Dr Ying Leung for his direction and support
for the duration of my thesis, Professor Doug Grant for his assistance and Professor Yun
Yang for his assistance and encouragement in the final stages. I also wish to thank my
colleagues who have offered support along the way.
I would like to dedicate this work to Viv, Rebekah and Jai for all their encouragement over
the years. I would also like to dedicate this work to Ron Farrell for teaching me the value of
persistence and June Farrell for instilling confidence to take on challenges.
.
iv
ABSTRACT
Certain criteria need to be fulfilled for assessment to be considered of strategic value. This is
unfortunately not the case for many assessment strategies used today. The advent of new
technology in many cases has extended traditional assessment tools well beyond their intended
application, consequently falling short of their true goals of correctly grading the student while
supplying meaningful feedback to the learning process. The need to address the shortcomings of
traditional assessment strategies is necessary in order to improve the representation of a student’s
present level of knowledge. Educators generally concede the existence of these inherent
inadequacies with traditional assessment, such as the encouragement of guessing, failure to
recognize partial knowledge, miscalibration of confidence and the inability for a student to
declare minimal or no knowledge. Sound educational process is dependent on assessment
strategies, as they greatly contribute to the learning experience, both as a method of formally
assigning a grade (Summative Assessment) and as a means of giving feedback (Formative
Assessment). The value of good assessment is to encourage the instructor and student to reflect
on the results, often leading to adjustments in the student’s personal study program and
refinement of the curriculum by the instructor.
In considering previous research of others, this research promotes assessment with confidence
measurement as a method to address the inadequacies of traditional assessment strategies,
offering increased richness of feedback, elimination of the benefits gained from guessing and
encouraging the declaration of partial or no knowledge. This research then promotes the use of
an innovative assessment tool based on the traditional Multiple-choice Question (MCQ) format
that incorporates a method to measure the confidence of the student in their preferred answer/s,
referred to as the Multiple-choice Questions with Confidence Measurement (MCQCM).
The preliminary pilot programs identified some critical usability issues pertaining to its
operation, functionality and the operational cognitive process. The further development and
refinement of the MCQCM required consideration to HCI User Centred Design (UCD)
principles, formulating a set of heuristics specifically for assessment with confidence
measurement interactive systems. This research investigates the application of games taxonomy
to the educational arena to identify the criteria by which good interactive assessment tools should
conform and its application to the MCQCM.
v
This research identified the MCQCM to be equally reliable as other traditional assessment
options, producing a convergence of scores, confirming it as a valid method of summative
assessment. The observations and resulting analysis indicated that the greater distribution of
scores contributed to a more dispersed allocation of grades for the students.
The MCQCM utilizes technology to improve student learning in a progressive educational
climate that requires strategic assessment solutions. This research encourages other educators to
question current assessment practices and embrace the use of technology that is relevant to their
individual requirements.
vi
AUTHOR’S PUBLICATIONS
Journal Paper.
Farrell, G. & Leung, Y. (2004). Innovative Online Assessment. Education and Information
Technology. Journal of the IFIP Technical Committee on Education, 9(1), 5-20.
Conference Papers.
Farrell, G. Farrell, V., & Leung, Y. (2001). Online Software Test for Efficient and Effective
Assessment Using Multiple Choice Questions- An Evaluation. Paper presented at the American
Educational Research Association Conference Seattle, USA.
Farrell, G. & Leung, Y. (2002). Designing an Online Self-Assessment Tool Utilizing Confidence
Measurement. Paper presented at the Seeking Success in E-Business, IFIP 8.4 Working Group,
Copenhagen, Denmark.
Farrell, G. & Leung, Y. (2002). Improving the Design of an Online Self-Assessment Tool
Utilizing Confidence Measurement. Paper presented at the Web-Based Learning: Men and
Machines, Hong Kong.
Farrell, G. & Leung, Y. (2004). Comparison of Two Student Cohorts Utilizing Black Board CAA
with Different Assessment Content: A Lesson to be Learnt. Paper presented at the Computer
Assisted Assessment Conference Loughborough, England.
Farrell, G. & Leung, Y. (2005). A Comparison of Blackboard CAA and an Innovative Self-
Assessment Tool for Formative Assessment. Paper presented at the Computer Assisted
Assessment Conference, Loughborough, England.
vii
Farrell, G. & Leung, Y. (2006). A Comparison of an Innovative Assessment Tool Utilizing
Confidence Measurement to the Traditional Multiple Choice, Short Answer and Problem Solving
Questions. Paper presented at the Computer Assisted Assessment Conference, Loughborough,,
England.
Farrell, G. & Leung, Y. (2008). Convergence of Validity for the Results of a Summative
Assessment with Confidence Measurement and Traditional Assessment. Paper presented at the
Computer Assisted Assessment Conference, Loughborough, England.
viii
TABLE OF CONTENTS
Abstract......................................................................................................................................... iv
Author’s Publications .................................................................................................................. vi
List of Figures.............................................................................................................................. xv
List of Tables ............................................................................................................................. xvii
ETHICS APPROVALS............................................................................................................. xix
CHAPTER 1 Introduction ............................................................................................................ 1
1.1 Contribution of Assessment to Education ............................................................................ 2
1.2 Assessment Strategies ............................................................................................................. 2
1.3 The Advent of E-learning....................................................................................................... 3
1.4 The Criteria for Good Formative and Summative Assessment.......................................... 5
1.5 Concerns with Current Assessment Strategies .................................................................... 7
1.5.1 Assessment That Encourages Guessing................................................................................. 7
1.5.2 Assessment That Trains the Student to Do Well ................................................................... 7
1.5.3 Assessment Failing to Recognise Partial Knowledge and Miscalibration of Confidence..... 8
1.6 The Role of Computer Based Assessment in Addressing Issues of Assessment.............. 10
1.7 The Purpose of this Research............................................................................................... 11
1.8 Problem Statement................................................................................................................ 11
1.9 Scope and Aim of this Research .......................................................................................... 12
ix
1.9.1 Scope.................................................................................................................................... 13
1.9.2 Aims..................................................................................................................................... 13
1.10 Overview of Thesis.............................................................................................................. 14
CHAPTER 2 Research Design .................................................................................................... 17
2.1 Research Methodology ......................................................................................................... 18
2.1.1 Overview of Research Methodology ................................................................................... 18
2.1.2 Adopted Research Methodology.......................................................................................... 22
2.2 Research Design to Address Research Questions .............................................................. 22
2.3 Research Framework............................................................................................................ 25
2.4 HCI Approach to Problem Solving ..................................................................................... 28
2.5 Summary of this Research Structure.................................................................................. 29
CHAPTER 3 Variations Of Non-Conventional MCQ Assessment Strategies For Learning 30
3.1 Learning Theories and Learning Styles.............................................................................. 31
3.1.1 Learning Theories ................................................................................................................ 31
3.1.2 Learning Styles .................................................................................................................... 32
3.2 The Value of Feedback in the Learning Process................................................................ 34
3.3 Formative and Summative Assessment as Part of the Learning Path............................. 35
3.4 Assessment as a Means of Shifting the Responsibility of Learning to the Student ........ 38
3.5 Assessment Using New Technology..................................................................................... 39
3.6 Concerns with Computer Assisted Assessment.................................................................. 40
3.7 Assessment Options Available ............................................................................................. 41
3.8 Multiple-choice Questions .................................................................................................... 42
3.9 The Suitability of MCQ Tests to the New Technology ...................................................... 44
x
3.10 Previous Work on Innovative Approaches to MCQ Assessment ................................... 44
3.10.1 The Need for Innovative Scoring for Assessment ............................................................. 45
3.10.2 MCQs Designed to Eliminate Guessing ............................................................................ 46
3.10.3 Innovative MCQ Assessment with Confidence Measurement .......................................... 47
3.11 Interactivity in Learning .................................................................................................... 50
3.12 Contribution of Assessment with Confidence Measurement to Hede’s (2002) Model. 53
3.13 Assessment with Confidence Measurement as the Proposed Solution .......................... 56
3.14 Summary.............................................................................................................................. 57
CHAPTER 4 Scoring Options For Assessment With Confidence........................................... 59
4.1 Taxonomy of Scoring............................................................................................................ 61
4.2 Previous Scoring Methods to Address the Issue of Guessing ........................................... 62
4.3 Scoring Using Penalties for Incorrect Answers to Reduce the Impact of Guessing ....... 64
4.4 Comparison of an Incremental Balanced Scoring Method to Previous Work................ 82
4.5 Choice and Justification of Scoring Method for this Research ........................................ 90
4.6 Summary................................................................................................................................ 93
CHAPTER 5 Development Of The Multiple-choice Questions With Confidence
Measurement (MCQCM) Prototype And Pilot Program ....................................................... 94
5.1 The MCQCM ........................................................................................................................ 95
5.2 Design of the Rudimentary MCQCM Prototype ............................................................... 96
5.3 Pilot Studies ......................................................................................................................... 100
5.3.1 Aims of Pilot Studies ......................................................................................................... 100
5.3.2 First Pilot Study ................................................................................................................. 101
5.3.3 Second Pilot Study............................................................................................................. 105
xi
5.4 Discussion ............................................................................................................................ 116
5.5 Further Development of the MCQCM ............................................................................. 118
5.6 Summary.............................................................................................................................. 118
CHAPTER 6 Designing And Refining The MCQCM For Delivery Via The Web.............. 120
6.1 Games Taxonomy................................................................................................................ 122
6.2 Game Theory Relevance to Educational Games.............................................................. 123
6.2.1 Fundamental Game Theory Criteria .................................................................................. 124
6.2.2 The Goals and Rules of a Game ........................................................................................ 124
6.2.3 Game Fairness.................................................................................................................... 125
6.2.4 Games Risk and Rewards .................................................................................................. 125
6.2.5 Learning the Game Play..................................................................................................... 126
6.2.6 The Influence of Skill, Stress and Absolute Difficulty on Games..................................... 126
6.3 MCQCM Adherence to Game Play Topology.................................................................. 127
6.3.1 MCQCM Adherence to Playability Guidelines and Heuristics ......................................... 127
6.3.2 MCQCM’s Hierarchy of Challenges and Actions ............................................................. 128
6.3.3 MCQCM Learnability........................................................................................................ 129
6.3.4 Fairness of the MCQCM.................................................................................................... 129
6.3.5 MCQCM Stress Levels and Overall Level of Difficulty ................................................... 130
6.3.6 Summary of MCQCM Adherence to Game Play Topology.............................................. 130
6.4 Addressing Design and Usability Issues of MCQCM...................................................... 131
6.4.1 Addressing the Cognitive Load of the MCQCM............................................................... 134
6.4.2 HCI Evaluation of the MCQCM........................................................................................ 138
6.4.3 Heuristics Testing for Computer Aided Assessment (CAA)............................................. 138
xii
6.5 MCQCM Heuristic Evaluation Method ........................................................................... 141
6.5.1 MCQCM Redesign Resulting from Usability Heurisitics ................................................. 142
6.5.2 Grid Layout of Question Screen ........................................................................................ 142
6.5.3 Visibility of Student Progress During the MCQCM Test.................................................. 144
6.5.4 Minimisation of Errors and Error Prevention .................................................................... 145
6.5.5 Clear and Informative Feedback........................................................................................ 146
6.5.6 Summary of the Redesigning of the MCQCM Adhering to HCI Guidelines.................... 148
6.5.7 Heuristics for MCQ with Confidence Measurement ......................................................... 148
6.6 MCQCM’s Method of Handling Graphical Components............................................... 150
6.6.1 Previous Investigative Work on the Graphics Component of Interactive Assessment ..... 151
6.6.2 MCQCM’s Graphics Solution ........................................................................................... 153
6.7 Summary.............................................................................................................................. 158
CHAPTER 7 Comparison Of The MCQCM To A Traditional CaA Package For Formative
Assessment ................................................................................................................................. 160
7.1 Trial...................................................................................................................................... 161
7.2 Comparison of the MCQCM to a Traditional Computer Based Formative Assessment
Package ...................................................................................................................................... 162
7.2.1 Method ............................................................................................................................... 162
7.2.2 Results Analysis for Students ............................................................................................ 163
7.2.3 Instructor’s Focus Group for Formative Assessment ........................................................ 166
7.3 Concluding Observations of Comparison of MCQCM to Traditional Computer
Assessment ................................................................................................................................. 168
7.4 Summary.............................................................................................................................. 168
xiii
CHAPTER 8 Using The Web-based MCQCM For Summative Assessment ....................... 170
8.1 Initial Trials using MCQCM as a Summative Assessment Tool .................................... 171
8.1.1 Setting ................................................................................................................................ 171
8.1.2 Results................................................................................................................................ 172
8.1.3 Discussions and Conclusions............................................................................................. 176
8.2 Comparative Analysis of using the MCQCM as a Summative Assessment tool to the
Traditional Short Answer, MCQ and Long Answer Assessment ........................................ 178
8.2.1 Method of Comparative Study........................................................................................... 179
8.2.2 Results................................................................................................................................ 179
8.2.3 Discussions and Conclusions............................................................................................. 181
8.3 Comparative Analysis of using the MCQCM and Traditional MCQ as a Summative
Assessment Tool ........................................................................................................................ 182
8.3.1 Method ............................................................................................................................... 182
8.3.2 Results................................................................................................................................ 182
8.3.3 Discussions and Conclusions............................................................................................. 185
8.4 Instructor’s Focus Group for Formative Assessment ..................................................... 185
8.5 Discussion ............................................................................................................................ 186
8.6 Summary.............................................................................................................................. 187
CHAPTER 9 Summary, Conclusion And Future Work ........................................................ 189
9.1 Summary of the Research .................................................................................................. 190
9.2 Recapitulating on Previous Chapters................................................................................ 192
9.3 Discussion ............................................................................................................................ 197
9.3.1 MCQCM as a Valuable Formative Assessment Tool........................................................ 197
xiv
9.3.2 MCQCM as a Summative Assessment Tool ..................................................................... 200
9.4 Ethical Issues ....................................................................................................................... 203
9.5 Limitations of Study ........................................................................................................... 203
9.5.1 Scope.................................................................................................................................. 203
9.5.2 Internal Validity ................................................................................................................. 204
9.5.3 External Validity, Transferability ...................................................................................... 204
9.5.4 Construct Validity.............................................................................................................. 205
9.5.5 Ecological Validity ............................................................................................................ 206
9.6 Research Contribution ....................................................................................................... 206
9.6.1 Outcome 1: The MCQCM Tool......................................................................................... 206
9.6.2 Outcome 2: The Value of Assessment with Confidence Measurement for Formative
Assessment.................................................................................................................................. 208
9.6.3 Outcome 3: The Value of Assessment with Confidence Measurement for Summative
Assessment.................................................................................................................................. 210
9.6.4 Outcome 4: Heuristics for CAA with Confidence Measurment........................................ 212
9.6.5 Outcome 5: The Contribution of this Research to Educators Investigating Alternative
Assessment Strategies................................................................................................................. 213
9.7 Future Work........................................................................................................................ 215
9.8 Concluding Remarks .......................................................................................................... 217
AppendiX A: SURVEYS .......................................................................................................... 235
Appendix B: Simulation Result Displays................................................................................ 249
Appendix C: MCQCM Screen Presentations..................................................................... 251
xv
LIST OF FIGURES
Figure 3-1: Kolb's (1984) Learning Style Model ........................................................................ 33
Figure 3-2: Confidence Measuring Template, Paul (1994).. ........................................................ 48
Figure 3-3: Hede’s (2002) Integrated Model of Multimedia Effects on Learning ....................... 52
Figure 3-4: Relation of Assessment with Confidence Measurement to Hede’s (2002) Multimedia
Model. ........................................................................................................................................... 55
Figure 4-1: Paul’s CBAA Triangle with the Corresponding Score for Each Region.................. 75
Figure 4-2: CBAA Scores for C1, C2 &C3 ................................................................................. 79
Figure 4-3: Other Scoring Options used Including Scheme A from Hassmen & Hunt (1994) and
Schemes B-D from Davies (2005)................................................................................................ 80
Figure 4-4: MCQCM Scoring with Optimal Path......................................................................... 85
Figure 4-5: Graph Comparing the MCQCM and CBA Expected Scores..................................... 87
Figure 5-1: The MCQCM Prototype Developed to Run the Initial Trials.................................... 96
Figure 5-2: Scoring Calculator for MCQCM Table 5-3 ............................................................... 99
Figure 5-3: Age and Gender Distributions for Both Cohorts of Students .................................. 108
Figure 5-4: Frequency of the Under Graduates, Cohort 1, Scores for Each Question ............... 112
Figure 5-5: Frequency of the Postgraduates, Cohort 2, Scores for Each Question.. .................. 112
Figure 6-1: Slide Rule to Register Confidence .......................................................................... 135
Figure 6-2: First Fundamental Version of the Web Based MCQCM ........................................ 137
xvi
Figure 6-3:The Appearance of the Confidence Sliding Bar ....................................................... 137
Figure 6-4: Grid Layout of MCQCM ......................................................................................... 142
Figure 6-5: Grid Layout of the Functional Areas, Distinguished by Numbers .......................... 143
Figure 6-6: Question Display Showing 3 Navigational Supports............................................... 144
Figure 6-7: Support for User to Minimise Errors ....................................................................... 145
Figure 6-8: Final Dialogue Box to Support the User in Error Prevention .................................. 146
Figure 6-9: Feedback Screens: (A) Display for all Questions with Hyperlink to (B) Display of
Individual Questions ................................................................................................................... 147
Figure 6-10: MCQCM Dual Screen Display .............................................................................. 156
Figure 6-11: MCQCM Diagram as a Full Screen....................................................................... 157
Figure 6-12: Demonstration of MCQCM Display of Varied Screen Sizes ................................ 156
Figure 7-1: Graph of MCQ and MCQCM Scores for Cohort 1.................................................. 163
Figure 7-2: Graph of MCQ and MCQCM Scores for Cohort 2.................................................. 163
Figure 8.1: MCQ and MCQCM Scores for Each Student with the MCQ Clustered.................. 173
Figure 8-2: The Student’s MCQ (clustered ascending order) and MCQCM Scores.................. 183
xvii
LIST OF TABLES
Table 2-1: Research Questions Addressed by the Positivism Research Paradigm....................... 25
Table 2-2:Research Questions Addressed by the Interpretivism Research Paradigm.................. 27
Table 4-1: Possible Responses with Four Options, Pollard (1985) .............................................. 65
Table 4-2: Scoring Formulas for Responses Pollard (1985)......................................................... 66
Table 4-3: Pollard’s Two Solutions for k Values.......................................................................... 69
Table 4-4: Expected Scores for Random Guessing, Pollard (1985) ............................................. 69
Table 4-5: Example of Pollard’s Scores for Both Sets of Values of k.......................................... 70
Table 4-6:CBA Scoring System for Correct and Incorrect Answers............................................ 79
Table 4-7:Balanced Scoring Registered Confidence for Correct and Incorrect Answers ............ 84
Table 4-8: The Average Expected Scores from MCQCM and CBA............................................ 87
Table 5-1: Rules and Example of a Score for a Given Scenario .................................................. 97
Table 5-2: Resulting Score for Options Given the Student’s Choice and Their Registered Level
of Confidence................................................................................................................................ 98
Table 5-3: Example of a Question, which has 2 Correct Answers B and C ................................. 99
Table 5-4: Pilot Program Student and Instructor Observations .................................................. 104
Table 6-1: List of Sim et al. (2006) Heuristics for CAA ............................................................ 141
Table 6-2:List of Sim et al. (2006) Heuristics with Elaborated Heuristics for MCQ with
Confidence Measurement and Problems Addressed by Revised Heuristics .............................. 150
xviii
Table 7-1: Proportion of Postgraduate and Undergraduate Students and Proportion of Each
>25 Years of Age........................................................................................................................ 164
Table 7-2: Responses to the Questions of Student’s Perception of the MCQCM...................... 165
Table 7-3: Responses of Student’s Perception of the MCQCM vs BB ...................................... 166
Table 8-1: Average, Standard Deviation and Difference for Both Marking Schemes ............... 172
Table 8-2: The Correlation for the Two Marking Schemes........................................................ 174
Table 8-3: Means and Standard Deviations for Each of the Section of the Exam...................... 179
Table 8-4: Correlation Table for the Sections of the Exam........................................................ 180
Table 8-5: Correlation of MCQ with MCQCM.......................................................................... 184
Table 8-6: Chi-Square MCQ to MCQCM .................................................................................. 184
xix
ETHICS APPROVALS
“Innovative Online Assessment Using Confidence Measurement.“ (Chapter 5):
Extension Granted to School of IT Approval for “Online Self-Assessment Using Multiple-choice
Questions: An Innovative Approach.”, issued by the School of Information Technology,
Swinburne University of Technology 2000.
Approval Code: IT2000-04
1
CHAPTER 1 INTRODUCTION Education has its origins deeply rooted in our history, and in many instances, continues
to adhere to the fundamental core principles. It is the ongoing challenge of educators
to develop, trial and implement innovative educational approaches that facilitate the
changing demands of the learners, learning institutions and the general society.
Education finds itself positioned firmly within the context of the present being
influenced by societies expectations and the political environment. As a result,
educational institutions are often required to deliver their programs adhering to and
complying with governing legislation within budgets imposed by the government of the
day. Educators face many challenges when dealing with the dilemma of delivering
fundamentally traditional programs with new technology enhanced components.
Activities have culminated in the production of education tools designed to increase
educational productivity while providing extended services to more individuals for less
cost. The advent of the technology has necessitated research into the evolving concerns
and the development of best practice.
2
1.1 Contribution of Assessment to Education
Assessment is a major contributor to education as a means of providing both feedback
to the student (Formative assessment) during their learning experience and as a grading
mechanism (Summative assessment) reflecting their level of achievement. A
longstanding motivation for assessment is for the comparison of an individual’s
recorded achievement to others, current and previous. It is these comparative grades
that can be used to determine the vocational and further educational paths of the
participants. This primary objective of assigned level of achievement underpins the
contribution of assessment to education. Progressive work in the area of assessment is
considered to be of value, as the ability to improve the method of grading has perceived
benefits to all participants. Additionally, the opportunity to produce richer feedback to
both the student and the instructor during the learning experience is beneficial.
1.2 Assessment Strategies
The acquisition of knowledge is not linear (Hyerle, 2009), as students often approach
learning by moving freely through the educational matrix absorbing knowledge at
various times and events. Longino considers the growth of knowledge to be non-linear,
irregular, layered and patchy (Longino, 2002). It is the need for instructional material
to facilitate for the diversity of the learners, requiring differentiated instruction for their
multiple intelligences and habits of mind. Likewise assessment strategies must also be
provided to cater for the multidimensional aspect of acquired knowledge. Many of the
assessment strategies used today evolved from the long time practices that are linear in
design and fail to provide the complexity required to gauge the further dimensions of
knowledge. The dilemma faced by educators for years has been identifying the most
appropriate assessment regime to adopt that will accurately reflect the level of
knowledge of any given student. The present practice is to employ a combination of
assessment strategies that over time have proven to be reliable, such as multiple-choice
questions (MCQ), short essay answers, case studies and the like. Much discussion still
3
exists over the reliability of such testing mechanisms with educators continually
questioning their validity.
MCQ testing has long been a popular choice by academics for their ability to assess
large numbers of students in broad areas of the curriculum with relative ease (Tarrant,
Ware, & Mohammed, 2009). The ability of recycling questions from large readily
available banks is greatly appealing to the already over-burdened instructor. MCQs by
their design lend themselves to extension with the advent of technology evolving into
more complex levels of application. While this gives the opportunity for quicker
responses to both the student and the instructor there is often over-zealous
embracement of technologically advanced MCQ assessment strategies. This can often
result in the application of MCQ testing for purposes for which they were not truly
designed. To add to the concern of the extension of MCQ testing is the inability of
instructors to construct suitable MCQ questions, which has been an issue well before
the advent of the surrounding technology, and only serves to exasperate the situation.
The more critical and active educators endlessly pursue innovative alternatives in an
attempt to improve the assessment process. This research documents such a journey,
following the investigative path of an innovative approach to assessment that
incorporates confidence measurement. As a means of validation it compares the
outcomes produced from the implementation of such an innovative approach directly to
those from traditional assessment activities to ascertain their validity and reliability.
Importantly, it acknowledges that all of these comparisons are to existing assessment
strategies, classified as the benchmarks in traditional instructor driven delivery. It also
respects (Lederman & Niess, 2000) sentiments; that technology in learning should only
be introduced under the premise of a sound pedagogical reason.
1.3 The Advent of E-learning
Recently there has been a shift in our approach to education that can be partially
attributed to the impact that the Internet and its associated technology has on every
aspect of life. With the advent of Internet, e-learning with the Internet as a medium for
global 24 hours/7 days a week access has become an accepted method for delivery,
4
interaction and assessment. Research into innovative assessment requires examination
of the relevant elements of the e-learning environment and its important contribution to
the educational arena. The modern IT savvy student of today has embedded the
encompassing technology into all aspects of their life. Education is no exception, as
they have expectations of educational activities being available to them not only in the
classroom, but also in the library or remotely in their designated place of study via the
Internet. Interactive assessment is a significant participant, often used at different levels
of activity in the spectrum of educational delivery, where elements of the e-learning
paradigm are called upon, to various extents, to enrich the learning experience and offer
increased feedback and support.
This research acknowledges the willingness for educators to develop innovative
assessment strategies as a means of improving the learning experience and in particular
to consider new technology to improve these strategies. This research leverages off the
e-learning platform, however it is not wholly reliant on it for its operational existence.
The purist’s definition of e-learning incorporates all aspects of the practice, including
Internet based presence, computer managed learning (CML) applications, synchronous
and a-synchronous methods of electronic communication and delivery, online
assessment and many other contributors to the paradigm. In reality, e-learning is often a
hybrid approach, or “Blended Learning” (Keller, 2008), where aspects of the traditional
mode of education, such as lectures, tutorials and laboratories, are blended with online
supportive activities, such as online quizzes, streamed lectures and multimedia
presentations (Harris, Sadowski, & Birchman, 2006). In most cases, this hybrid
approach offers increased flexibility, enhanced learning environments and a richer
learning experience while still benefiting from the traditional modes of delivery. The
e-learning paradigm used by educational institutions often contains the full spectrum of
delivery modes, from the student studying via distance education solely to full on-
campus participation. E-learning activities within many institutions can be split into
two components. Firstly, the “learning” element that includes delivery modes in the
traditional format. Secondly, the “e” activities, where the more non-traditional
components of the educational process lie, as stated by Howlett et al. that “online
educational techniques can be effectively blended with other forms of teaching”
5
(Howlett et al., 2009). This research considers a non-traditional approach to education
greatly supported by elements of the e-learning paradigm, in particular the field of
assessment. While this research associates itself in the e-learning phenomenon, it has
its grounding in the historically, long tested and validated traditional delivery
paradigms.
1.4 The Criteria for Good Formative and Summative Assessment
Assessment plays a critical role in the educational process as both a means of grading
(summative) and supplying valuable feedback (formative) to the students. The
embracement of technology as a major contributor to the delivery of education has
increased the expectations for effective assessment systems to be available to the
student, encouraging self-assessment at all stages of the learning experience. It is
accepted wisdom that testing for the purpose of feedback should be an integral part of
the sequence of learning activities rather than an interruption to that sequence, as
discussed in the Principles and Standards for School Mathematics (National Council of
Teachers of Mathematics, 2000).
Traditionally, both formative and summative methods of assessment have been reliant
on the instructor to supply feedback. With the ever-increasing demand to restrain
delivery cost, resulting in increased student to staff ratios, unintentional delays in
supplying feedback to the student can occur to the disadvantage of both the student and
the instructor. As a result, often the most valuable feedback occurs at the final stages of
the learning path, generally too late to be of great value to the students’ learning
process and providing only limited feedback to the instructor. This situation tends to be
of primary benefit to the succeeding group of students participating in the next
scheduled delivery of the subject.
Good assessment practices need to meet a set of criteria to be considered of any true
value. Formative assessment is required to produce meaningful feedback to the benefit
to the student and the instructor. It is equally important that this feedback is timely,
contributing to the learning path of the student, and designed to underpin rather than
undermine student confidence (Torrance, 2007a). The feedback from formative
6
assessment should be easily comprehended by the student offering direction in their
learning by highlighting the areas of understanding, misunderstanding and complete
ignorance. Immediate feedback often enhances the value of a formative assessment
exercise, which is an inherent characteristic of an online assessment tool due to the
nature of the encompassing technology. The formative assessment feedback cycle
should occur early, and then constantly throughout the duration of the student’s
learning experience (Farrell & Leung, 2002a), encouraging self regulation by the
students (Nicol & Macfarlane-Dick, 2006). Formative assessment feedback identifies
the areas of concern encouraging the instructor to evaluate the method of delivery and
adjust the remaining instructional program as deemed appropriate. Banta, Jones and
Black (2009) state that without examination of the results, improvement of the learning
outcomes cannot occur (Banta, Jones, & Black, 2009). Summative assessment is to
assess the level of the student’s knowledge for any particular area. Summative
assessment requires being correctly scheduled, being appropriately timed to fit into the
program to be the most effective (Banta et al., 2009) and producing a set of results that
offer validity and reliability while being fair and ethical (Rice, Campbell, & Mousley,
2007). Rice, Campbell and Mousley (2007) are concerned that many educators still see
summative assessment as only a method of collecting information and comparing the
relative worth of different students. While acknowledging that an objective of
summative assessment is to produce a set of comparable results, Rice Campbell and
Mousley (2007) emphasise that it can encourage deeper understanding by merging
together assessment and the teaching process with quality teaching. Summative
assessment should also offer the same feedback as the formative assessment activities,
guiding the student to address the areas of concern as outlined above. Summative
assessment activities often occur during the delivery period and in some cases very
early, which enables the student to use the experience as a self-assessment exercise and
therefore should be designed with this in mind.
While many educators are willing to accept traditional assessment strategies, there are
those who doubt their true value and acknowledge their limitations. Generally it is
accepted that there is societal and academic need for assessment and hence the need for
assessment strategies that are reliable and manageable. Some assessment strategies fall
7
short of the criteria mentioned above. The following sections identify some of the
concerns of present assessment strategies that are relevant to this research.
1.5 Concerns with Current Assessment Strategies
In order to meet acceptable criteria for good assessment where student cohorts are large
and feedback is required to improve student learning, it is necessary to understand the
problems associated with current assessment strategies. The problems identified are
students guessing rather than knowing the correct response, students learning how to
respond to a specific type of assessment, assessment not recognising partial knowledge
or not recognising students have acquired incorrect knowledge.
1.5.1 Assessment That Encourages Guessing
Presently, educators often use assessment tasks that permit and encourage guessing as
part of the testing strategy, such as in the case of the standard MCQ tests. Systems that
permit and encourage guessing can in many instances overstate the student’s current
level of knowledge. This misconception of knowledge is a major contributor to the
inspiration of this research. Guessing the answer may not necessarily be in itself a
severe problem if it is the consequence of serious consideration by the student when
deliberating over the correct answer. The actual learning is reliant on the student being
informed of the correct answer to the question that they guessed as part of the post
analysis, hence turning the experience into a formative assessment exercise.
This study is not alone in considering penalties for incorrect answers designed to
eliminate gain from guessing and will discuss previously developed innovative
assessment strategies in Chapter 3.
1.5.2 Assessment That Trains the Student to Do Well
An additional concern is that the mode of testing may encourage the students to channel
their learning. Such a tendency is not restricted to newly introduced innovative
assessment strategies but has been inherent with most of the assessment mechanisms to
8
date. A student faced with an MCQ test will often set their study routine to address the
lower levels of knowledge, as described by Bloom’s Taxonomy of Educational
Objectives (Bloom & Krathwohl, 1956) and as presented by Marshall and Carson
(2008) as rote learning formulas and terminologies (Marshall & Carson, 2008). In those
cases the student’s learning is opposed to the efforts needed to answer questions based
on case studies at the higher Bloom’s (1956) levels of application and synthesis. While
we tend to be more forgiving of these shortcomings for the afore-mentioned traditional
assessment strategies, there is a tendency for recently introduced innovative assessment
applications to be over-scrutinized, attracting harsh criticism for the same offences.
During this research, focus groups of assessment practitioners (Farrell & Leung, 2008)
discussed the merit of incorporating penalties for wrong answers. The educators who
presented this innovative approach attracted negative responses from some academics,
being accused of introducing assessment systems that effectively trained the student in
achieving the highest grade. Crocker (2005) considers it harmful to educational
development to give instruction with the purpose of making students skillful test takers
(Crocker, 2005). In recognition of this being considered as a bias, Astin (1991)
recommends that all existing assessment be scrutinised when new assessment strategies
are introduced (Astin, 1991).
1.5.3 Assessment Failing to Recognise Partial Knowledge and Miscalibration of
Confidence
Traditional testing methods, such as MCQs, generally offer an adequate method of
assessing student’s knowledge in areas deemed as right or wrong, however they fail to
cater for the “shades of gray” or fuzzier areas of knowledge. The data gained from
assessment activities that permit the registration of partial knowledge may permit the
understanding of misconceptions or gaps in the student’s knowledge.
Davidoff (1995) recognises this need to identify partial knowledge in the medical arena,
promoting a more thorough approach in assessing students (Davidoff, 1995). He
suggests that a system designed to recognize incomplete or partial knowledge, also
suggested by Ben-Simon, Budescu and Nevo (1997) permitting the student to hedge the
answer, would be greatly beneficial to the learning process (Ben-Simon, Budescu, &
9
Nevo, 1997). They further state that a great percentage of medical knowledge is
incomplete, ambiguous and conflicting and therefore the standard MCQ testing method
does not facilitate or reflect the students’ level of knowledge along with their
confidence in that level of knowledge. In support of Davidoff (1995), Ng and Chan
(2009) purport that there is a need to allow for the student who knows only part of the
answer and that this is not the case with conventional MCQ testing that fails to capture
this partial knowledge (Davidoff, 1995; Ng & Chan, 2009). As previously discussed
MCQs were not originally designed to return such complex analysis and their extended
use accredited to the advent of the new technology places them in a failing position
within this task. Ultimately the effectiveness of MCQ format questions is reliant on the
construction of the questions. If partial knowledge is to be acknowledged the test
questions are required to be constructed in the context of the situation being assessed,
as for a medical emergency, where various correct answers are valid depending on the
situation in which they are asked. Davidoff’s (1995) main criticisms of MCQs is that
they only recognize and reward those areas of knowledge that are either right or wrong,
encouraging guessing and often leading to over-confidence. He considers miscalibrated
confidence in medical education equally as concerning as lack of knowledge. Clark and
Friesen (2009) consider systematic over-confidence by the individuals on the
economics field could have important consequences and Acker and Duck (2008) have
identified that propensities to over-confidence are dependant on cultural background
(Acker & Duck, 2008; J. Clark & Friesen, 2009).
Davidoff’s (1995) concerns of failing to permit students to demonstrate partial
knowledge and the issue of miscalibration of confidence is not confined to the medical
arena, as most educational disciplines expect their students to demonstrate absolute
knowledge for certain aspects of the curriculum and levels of knowledge and general
understanding for other areas (Davidoff, 1995). The ability to recognise a particular
path for further investigation is dependent on the investigator having a level of
confidence that there is a need for investigation. Miscalibration of confidence can have
equally a devastating effect on an engineering problem pertaining to the construction of
a pedestrian bridge as it has to the diagnosis of an individual with a medical condition.
10
Diamond and Forrestor (1983) define knowledge as asking the question “What do you
know?” followed by the meta-question “How sure are you of the answer to the question
about what you know?” in their attempt to address the issue of effective assessment
(Diamond & Forrester, 1983). They consider the registration of confidence to be a
significant indication of a true level of knowledge.
Chapter 3 extensively investigates the supportive evidence of the concerns outlined
above, along with Doebbert’s (1999) recognition of the need to place the management
of the learning into a student’s own hands, which would often culminate in a far deeper
understanding and learning (Doebbert, 1999).
1.6 The Role of Computer Based Assessment in Addressing Issues of
Assessment
With the introduction and acceptance of the Internet by a large proportion of the
developed world it is not surprising that education is a major beneficiary. The
opportunity to use this emerging technology as a major contributor to education has
been pursued by both corporate environments and educational institutions, yielding a
large number of Web-centric supportive tools and materials designed to complement,
supplement and as a substitute to existing learning materials in traditional format. The
inherent features of the encompassing technology, such as the ability to instantly
process and respond, has produced a myriad of educational supportive tools, numerous
Computer Managed Learning (CLM) applications and other online resources to
enhance the learning experiences of students today. Hence, e-learning has come of age,
contributing significantly to education, with a large number of educational institutions
utilising many of its functions.
As previously stated, the benefits of e-learning tools are not confined to the e-learning
domain alone. Many of the innovative approaches attributed to this paradigm have
comfortably found their way into the more traditional educational arena, to the extent
where many subjects offered by educational institutions, independent on mode of
delivery, contain a supportive Web presence using components of the Web paradigm,
discussion boards, computer aided testing and the like. The increase in flexibility for
11
the busy corporate-based student and the 24 hours/7 days a week access to learning
materials is an expectation of the net-centric participant of today.
Consequently, there is a need for reliable computer based assessment to be available as
an active part of the e-learning platform in support of the educational process. In
particular, the availability of online formative assessment tools to meet both the needs
of the student and instructor are required.
To be of significant benefit to the student and the instructor a computer based
assessment task must not only be easily accessible, convenient to the user and offer
timely effective feedback, it must also be capable of reflecting the participant’s level of
knowledge as accurately as possible.
To be fully accepted by the educational fraternity the resulting student’s grades must be
at least comparable to existing, validated assessment systems and offer sound
educational values.
1.7 The Purpose of this Research
This research postulates that there is the need to design, test and implement suitable
online formative and summative assessment tools to enhance the students learning
experience for the duration of their studies. The rapid adoption of the online
educational utilities by both academic institutions and corporate entities has catapulted
both the students and instructors into the e-learning paradigm. To meet the demand of
e-learning assessment the need to design, trial and implement innovative, formative and
summative assessment strategies to provide timely feedback and accurately gauge the
student’s level of knowledge within the context of the environment is vital.
1.8 Problem Statement
In light of the previous discussion the impact and contribution of the e-learning
paradigm in all aspects of educational delivery has been clearly identified and
acknowledged. Assessment plays an integral component of educational delivery. The
development of new innovative approaches to assist in the e-assessment process is
12
greatly appreciated by the instructors faced with the challenge of accurately grading
their students. This research pursues one such assessment option with vigor, working
on the premise of the ongoing need for improved assessment strategies to be
investigated, as formulated in the problem statement below.
“Present educational assessment strategies often fail to provide an accurate representation of students’ knowledge, which is detrimental to both the student and instructor.”
1.9 Scope and Aim of this Research
The supply of quality educational material offering maximum convenience to the
student is often seen to be beneficial to both the students and the instructors. In view of
the importance of e-learning enhancements to traditional education, and the
fundamental role played by assessment in the learning process, it is essential that we
explore technologies and approaches that will improve assessment effectiveness for all
delivery modes, spanning the full spectrum of educational delivery, from fully online to
face-to-face delivery. Flexibility and portability are fast becoming requirements of a
successful educational program. For the individual wishing to further his/her existing
qualifications or for those who want a change in career path, it is often the case that
learners find themselves unable to participate fully in the traditional mode of delivery,
and will pursue options that best fit their busy work schedule.
In some instances the e-learning components of education remove the personal
interactivity within the classroom environment, hence creating a challenge to the e-
learning paradigm to enhance the online learning experience by incorporating rich,
personalized and timely feedback to the individual student. In addition, the instructors
are reliant on feedback in relation to the student’s knowledge acquisition over a period
of time, given that they do not always have the advantage delivering face-to-face.
This research considers a solution to the problem statement as stated in 1.8. This
research is limited and defined by the following scope and aims.
13
1.9.1 Scope
The scope of this research is in the use of an e-learning assessment tool based on the
traditional MCQ format for formative and summative assessment purposes for a variety
of educational settings, ranging from fully distance online to face-to-face modes of
delivery. The developed assessment tool is Web-based, designed to be delivered via the
Internet with 24 hours/7 days a week accessibility. This study has been limited to post
secondary educational as part of the delivery program. This research evaluates the
proposed innovative assessment system to ascertain if it is reliable as both a formative
and summative assessment tool.
1.9.2 Aims
The aims of this research is to:
“Investigate the ability of assessment with confidence measurement to increase the accuracy of representing a student’s level of knowledge and whether it is acceptable as a valid assessment alternative by students and instructors for both formative and summative assessment” This research hypothesised that the outcomes of using assessment with confidence
measurement is of benefit to both the instructor and the student. Firstly, enabling the
student to have an honest self appraisal of their knowledge of the content being
assessed, highlighting the areas of concern, which in turn will assist them in their
direction of learning. Secondly, enabling the instructor to ascertain the knowledge of
the individual and/or the group as a whole that will assist them in determining the best
learning path to address the content that is not being truly understood.
This research follows the path of designing, implementing, testing and refining the
online assessment tool Multiple-choice Questions with Confidence Measurement
(MCQCM) (Farrell, Farrell, & Leung, 2001; Farrell & Leung, 2002a, 2002b; Farrell &
Leung, 2004b; Farrell & Leung, 2006, 2008). The assessment application incorporates a
confidence measurement component proposed as a means of increasing the accuracy of
the student’s level of knowledge. Primarily the MCQCM was designed as a formative
assessment tool to assist the students in reflection and encouraging self-assessment,
14
often leading to a deeper understanding of the material being taught. In the later stages
of the research cycle preliminary investigations occur to compare MCQCM current
methods of assessment when used for summative assessment. This research argues that
if used appropriately assessment with confidence measurement will offer both grading
and informative feedback to assist the student along their learning path, all of which are
critical components for the success of the learning.
1.10 Overview of Thesis
Chapter 2 will present the adopted research methodology and framework, initially
identifying the alternative approaches available and then the decisional criteria for the
choice. Chapter 2 uses the identified problem statement (section 1.8), formulating the
corresponding research questions and sub-questions, mapping them against the most
appropriate research method for addressing them. It will then consider the Human
Computer Interaction (HCI) User Centred Design (UCD) iterative approach (Hussain et
al., 2008; Righi & James, 2007) to problem solving given that the problem space of this
research is firmly planted in the real world.
The literature review in Chapter 3 initially discusses the role of assessment as part of
the educational process, introducing various assessment strategies presently used and
their consequential impact and influence on the learning path of the participant.
Importantly, it highlights the value of feedback to students and the significant
contribution that it plays in the learning process. Chapter 3 also investigates the work of
those who have undertaken rigorous research before and during this study, citing
previous work where innovative approaches to assessment strategies have been
incorporated into programs with varying levels of success. Chapter 3 proposes that an
assessment with confidence measurement strategy be adopted to address the issues
identified incorporating a balanced scoring technique of rewards and penalties. The
interactive assessment tool is based on the traditional MCQ format of a stem (question)
followed by a set number of optional answers, referred to as the Multiple-choice
Question with Confidence Measurement (MCQCM) (McCoubrie, 2004). A detailed
discussion follows considering fundamental learning theory underpinning the process
15
of student learning, in particular the role of intrinsic motivational factors that often
contribute to deeper understanding. Chapter 3 then acknowledges the important
contribution of Learning Styles and the need for the assessment activities to support the
four phases of learning. Chapter 3 then aligns the contribution of assessment with
confidence measurement to Hede’s (2002) Integrated Model of Multimedia Effects on
Learning (Hede, 2002).
Chapter 4 extensively discusses alternative scoring techniques investigated by other
researchers in the field, demonstrating the mathematics supporting their proposed
solutions. Chapter 4 systematically reveals equations based on probability theory which
culminate in the establishment of scoring regimes providing choice for the participant
that are properly motivating. It considers the expected values of the users when
interacting with the various systems and then completes a comparative analysis.
Chapter 4 closes with the recommendation of the use of a balanced scoring system as a
compromise and enhancement of previous methods offered.
Chapter 5 presents the findings from two pilot programs in which the MCQCM
prototype underwent some preliminary testing. On analysis of the outcomes, Chapter 5
identifies the shortcomings of the proposed assessment tool and discusses the need to
address these before proceeding.
Chapter 6 discusses the refinement of the MCQCM tool used in Chapter 5 as a solution
to the stated problem and considers heuristic evaluation and the handling of graphics
and programming scripts. Furthermore, Chapter 6 considers the topology of the game
play phenomenon (Adams & Rollings, 2007), relating the MCQCM elements of game
play in its design and structure.
Chapter 7 initially reports on the results of simulations designed to evaluate the
functionality and effectiveness of the MCQCM tool before implementation to a large
group. Chapter 7 then reports on an investigation into the students’ and instructors’
perceived value of the MCQCM as a formative assessment tool, discussing the
outcomes and possible enhancements required for an effective assessment strategy.
Chapter 8 reports on the results of implementing the MCQCM for summative
assessment on three separate occasions to ascertain its validity, reliability and
convergence to other traditional assessment methods. Initially Chapter 8 reports on a
16
study in which the students’ tests results were analysed to ascertain if the MCQCM and
the traditional MCQ scores converged. Additionally, this activity was designed to
gauge the students’ and instructors’ perception of using the MCQCM for summative
assessment. The next case study in Chapter 8 analyses the student scores for four
assessment strategies, MCQCM, Short Answers, Problem Solving and traditional
MCQ, to determine the validity and reliability of the MCQCM and its convergence to
the other methods of assessment scores. Chapter 8 then analyses the MCQ and
MCQCM scores for a cohort of 85 students to verify its use as an alternative
assessment option to be included in the suite of assessment strategies available.
Chapter 9 offers the conclusions and discussion, where the observations are drawn
together. It further identifies the limitations and the internal, external, construct and
ecological validity of this research. Chapter 9 answers the formulated research
questions and sub-questions and identifies a contribution of this research to the
educational arena for formative and summative assessment strategies. Chapter 9
continues with a series of recommendations for educators wishing to pursue alternative
assessment in the future, particularly assessment with confidence measurement.
Chapter 9 concludes with a summary of the findings and challenges faced when
embarking on investigating alternative assessment, with discussion on the advantages
and disadvantages of alternative scoring techniques available, and suggested approach
for the future development of the MCQCM.
17
CHAPTER 2 RESEARCH DESIGN
This Chapter identifies and argues for the chosen research methodology by initially
considering the various research paradigms, then formulates the research questions
and the corresponding supportive research sub-questions. The research framework is
then developed in which this research addresses the questions previously identified. A
discussion on the Human Computer Interaction (HCI) User Centred Design iterative
approach to problem solving (Isacker, Slegers, Gemou, & Bekiaris, 2009) then follows.
18
2.1 Research Methodology
When embarking on a research project the researcher is mindful of the three dominant
research paradigms, positivism, interpretivism and critical theory (or critical science)
(Cohen, Manion, & Morrison, 2007). These proven approaches have, since their first
conception, influenced the path of researchers and will continue to do so in the future.
Every researcher embarking on their research path is required to choose an appropriate
research method to assist them. It is this nominated research method that determines the
structure of the research and the framework to which the researcher will adhere. Whilst
this research does not intend to espouse extensive discussion on the above-mentioned
paradigms, it is considered necessary to briefly discuss the meanings of each in order to
justify the research approach adopted.
2.1.1 Overview of Research Methodology
The origins of positivism date back to the early 18th century, being mainly attributed to
the works of French philosopher Auguste Comte (1798-1857), and are deeply rooted in
the works of other great philosophers who followed. Comte published his theories as
the Cours de philosophie positive (1830–1842) (Comte, 1868). Consequently,
Positivism shaped the intellectual discourse of the late nineteenth century, having
grown from Comte’s absolute rejection of value judgments when observing social
science. Comte concluded that human thought had passed from the theological stage
into a metaphysical stage and was entering into what he termed as the positive stage
(confining itself to what is positively given, avoiding all speculation) or scientific stage
(Corveleyn & Luyten, 2006). Comte postulates that the positivist researcher of the
scientific method only concerned themselves with the observable and the encompassing
relationships. Here a critical underlying assumption is that there exists a basic
knowledge concerning human behaviour and all worldly phenomenon. It anchors its
existence on the premise that if an hypothesis or proposal cannot be tested empirically
it cannot be proven fact. There is no place for value statements in positivism, only
statements that can be scientifically proven. For positivism to hold credibility it is a
19
requirement that all statements being tested for validity must be grounded in
observation, and these observations must be repeatable (Johnson, Buehring, Cassell, &
Symon, 2006). In addition, any experiments undertaken during the research process
should use the techniques agreed and endorsed by the entire scientific community. The
positivism research activities, such as controlled experiments, removes the subjectivism
from the study and generates quantitative data to be analysed and statistically tested,
providing statistical confirmation in support or rejection of proposed hypotheses
(Steinmetz, 2007).
The latter conceived post-positivism takes a more moderate position. It acknowledges
the existence of subjectivism as the result of judgments made by the researcher in the
study. These inherent judgments can occur when the researcher chooses the subjects for
experiments and maps out the research path with their own preferred methods,
imposing their own value judgments and influences during the research process. The
supporters of this research paradigm consider the effects of this interference to be
minimal if the research process is correctly applied. Read claims that post-positivism is
to reorganize social research with a new approach, unhampered by the earlier
experiences (Reed, 2008; Steinmetz, 2007).
In direct comparison, the supporters of interpretivism propose that all knowledge is
socially constructed (Hogg & Maclaran, 2008). This is referred to as a constructivist
view of knowledge, and assumes that absolute knowledge does not exist and that, in the
majority, most knowledge is reliant and built upon previous knowledge (Stavropoulos,
2007). It is acknowledged that interpretivism can be an acceptable research
methodology in some contexts while being rejected in other scientific methods, as
described in the previous Positivism discussion. Gerring (2003a, 2003b) refers to
interpretivism as interpreting or clarifying, where the construction of truth relies on the
tests of coherence rather than (or in addition to) correspondence with external reality
(Gerring, 2003a, 2003b). This paradigm directly questions the validity of the positivism
paradigm in that it does not support the concept of objectivity. Rather it considers that
the influence and subjectivity of the researcher primarily directs the path of the study.
The true value of interpretivsm is fully appreciated when applied to research in a social
context, the human social phenomena which consider the feelings, values and
20
interactions of the participants (Gerring, 2007). Cassell and Nadin (2008) consider “the
adoption of interprevist approaches has much to offer in terms of theoretical and
methodological development” in the field of entrepreneurship (Cassell & Nadin, 2008).
The interpretivist is heavily reliant on questionnaires and surveys to elicit their
findings, which generally culminates in both qualitative and quantitative data. It is this
qualitative data that must be interpreted in context as opposed to measured for
statistical significance. It is of particular interest to this study as it considers the
individual’s behaviour in the educational field and not that of the generalised
population.
Critical theory, as social philosophy, was born in the German Frankfurt School in the
1930’s, based fundamentally on the work of Marx (Marx, 1884). It conjectures that all
knowledge is historical and biased, and that objective knowledge is illusory.
Importantly critical theory acknowledges that power leads to distorted communication
and by becoming aware of the ideologies that dominate in society, groups can
themselves be empowered to transform society (Fuchs & Sandoval, 2008). This
paradigm often contributes to the changes in social structure. The critical theory
paradigm attempts to address the power imbalances during research, permitting those
not traditionally in charge, such as the participants, to influence the research direction
and hence is generally employed for social research in areas such as human rights
(Ackerly, 2004). The research techniques employed here for critical theory are often the
same as those used by interpretivism, being the tools for qualitative data generation,
such as surveys, ethnographic studies and case studies. This technique places the
researcher only as an equal peer in the process, and encourages the researcher to be
actively involved in the problem situation.
The research paradigms above are acknowledged and highly regarded, depending on
your position in the research fraternity. It is common knowledge that these paradigms
underpin and have shaped most of the research to date, though some research
disciplines experience difficultly finding the appropriate positioning. These research
paradigms bring with them significant contributions to research and even though
having various degrees of mutual exclusivity (positivism versus interpretivism) and
synergies (interpretivism and critical theory) they can often offer the opportunity to
21
work cohesively in many applications. Each has its strengths and weaknesses; and each
has a role to play when faced with the complexity of a real world problem space. The
strength of positivism lies with its ability to capture and analyse objective data,
especially in confined, controlled environments (Giddings & Grant, 2007). However, in
many situations it is not the case that such data exists, as it is an artificial representation
to the real world in which the problem resides. While case studies, ethnographic studies
and the like give strength to interpretivism and critical theory, application of these
research paradigms can be considered to be lacking in mathematical rigor and
reliability.
No matter which research framework is adopted it is the research methodology that
determines the activities to be undertaken during the study. Within this particular
research there is a requirement of a positivist approach for elicitation of quantitative
data to primarily gauge the effectiveness and contribution of assessment with
confidence measurement to the broader population of students. For this reason this
study uses experiments and field studies to gather data in an attempt to represent the
trends and influences on the group, to be tested for validity and reliability. Accordingly,
it also seeks to identify the social significance that the system has on the individual by
encouraging them to register their feelings and values when interacting with the online
assessment with confidence measurement during their learning, aligning itself strongly
to the interpretivism paradigms.
It is critical that a clear direction of the research framework and method be determined
in the early stage to ensure that the activities are designed to meet the research
objectives and contribute to addressing the identified research question(s) (Mansell,
2009). Research activity produces data, both quantitative and qualitative (empirical),
which must be correctly interpreted in order to support the hypotheses and propositions
formed and tested during the research lifecycle. Quantitative and qualitative data have
equally important roles to play in the research arena and are seen as important
contributors in the research process. A well planned research path is required to ensure
that the outcomes of the various activities and the types of data produced are relevant
and of value to the research.
22
2.1.2 Adopted Research Methodology
This research uses a combination of both positivism and interpretivism paradigms, as it
deals with individuals interacting with an interactive assessment tool as a means of
securing grades as well as expressing states of emotion (McIlveen, 2007). It cannot
ignore the influences of the emotional state of the participant. Using interactive systems
to capture quantitative data is subject to the state of mind of the participants, that is to
say that recent positive or negative experiences might directly affect their level of self-
confidence and their propensity towards registering their confidence level.
Consequently, the quantitative data is of two kinds. Firstly, it has the unequivocal
scores derived from their actual correct choices during the test, which can be analysed
using the positivism research approach. Secondly, the participants then supply
subjective judgments about their level of confidence, hence an interpretivism approach.
There is a need to adopt a research methodology that employs both positivism and
interpretivism research paradigms, to be well balanced and managed. The use of
positivism and interpretivism within this study is identified in Tables 2.1 and 2.2 latter
in this chapter.
2.2 Research Design to Address Research Questions
In order to guide this research towards its aims and consequentially provide a solution
to the stated problem a series of questions have been formulated. These questions form
a progression of understanding of the value of the assessment with confidence
measurement to students and instructors and its validity as an assessment strategy.
The problem statement is reiterated here, being:
“Present educational assessment strategies often fail to provide an accurate representation of students’ knowledge, which is detrimental to both the student and instructor.”
23
The consequential supporting aim of the problem solving exercise is to “Investigate the ability of assessment with confidence measurement to
increase the accuracy of representing a student’s level of knowledge and whether it is acceptable as a valid assessment alternative by students and instructors for both formative and summative assessment.” As previously highlighted, this research focused on the use of assessment with
confidence measurement for formative and summative assessment, giving rise to the
following main research question.
Main Research Question:
“Does assessment with confidence measurement increase the accuracy of
representing a student’s level of knowledge for formative and summative assessment
application?”
This gives rise to two further research questions. The first is formulated to ascertain if
assessment with confidence measurement could be used for formative assessment and
is as follows:
Research Q1.
“Does Assessment with Confidence Measurement produce more meaningful
feedback and influence the learning path when used for formative
assessment?”
In support of the research question pertaining to formative assessment the following
sub-questions have been formulated.
Q1A: “What are the students’ and instructors’ attitudes and perceptions of assessment
with confidence measurement when used for formative assessment?”
Q1B: “How do the students’ results compare to the results of a standard Multiple
Choice Question (MCQ) test when using assessment with confidence
measurement for formative assessment?”
Q1C: “Does the use of assessment with confidence measurement provide additional
valuable feedback to the instructor when used for formative assessment?”
24
The second research question is formulated to ascertain if assessment with confidence
measurement could be used for summative assessment that is as follows:
Research Q2.
“Does Assessment with Confidence Measurement offer at least equivalent
Validity and Reliability compared to traditional assessment strategies when
used for Summative assessment?”
In support of the research question pertaining to summative assessment the following
sub-questions have been formulated.
Q2A: ”What are the student’s and instructor’s attitudes and perceptions of assessment
with confidence measurement when used for summative assessment?”
Q2B: “How do the results compare in Validity and Reliability to the results of the
standard MCQ test when using assessment with confidence measurement for
summative assessment?”
Q2C: “How do the results when using assessment with confidence measurement for
summative assessment compare in Validity and Reliability to other traditional
methods of summative assessment?”
Q2D: “Does the use of assessment with confidence measurement provide additional
valuable feedback to the instructor when used for summative assessment?”
Q1C and Q2D recognise an important component of this research, the value of
assessment with confidence measurement to the instructors, as the direction of the
learning can be amended by them depending on the feedback received. Often, the
instructor will vary the instructional material, readdressing concepts and changing the
emphasis on the material as required, to increase the student’s understanding.
Consequentially, enriched feedback has a direct influence on the learning path provided
by the instructor and needs to be considered when formulating the research questions.
In addition to the research questions formulated above there is also the need to consider
the usability and consequential design requirements when producing interactive
assessment strategies for implementation. This gives rise to the third research question:
25
Research Q3:
“What are the design requirements for developing an interactive assessment
with confidence measurement to ensure that instructors and students are able
to achieve maximum benefit from its application?”
While the direction of this research has been determined by these research questions
and sub questions, the drive of this research is to ascertain the value of an assessment
strategy that uses confidence measurement primarily as a formative assessment tool,
then later, as a summative assessment tool.
2.3 Research Framework
Consistent with the positivism paradigm there will be instances where the research
activity will be designed to capture quantitative data for statistical analysis through
experiments, statistical analysis of validity, reliability and convergence and surveys to
ascertain the effect of an assessment with confidence tool to the general population of
participants. In particular the following sub questions identified in Table 2-1 will be
predominately addressed using this approach.
Sub Questions addressed using Positivism Research Paradigm
Q1B: “How do the student’s results compare to the results of a standard
Multiple-choice Question (MCQ) test when using assessment with
confidence measurement for formative assessment?”
Q2B: “How do the results compare in Validity and Reliability to the results of
the standard MCQ test when using assessment with confidence
measurement for summative assessment?”
Q2C: “How do the results when using assessment with confidence
measurement for summative assessment compare in Validity and
Reliability to other traditional methods of summative assessment?”
Table 2-1: Research Questions Addressed by the Positivism Research Paradigm
26
To address the sub questions identified in Table 2-1 a particular research approach
and appropriate data analysis was required. It was deemed suitable that the method of
research activities designed would be of the experimental type, where the activities
occurred during the course of the subject delivery. Following these activities the
collected data was statistically analysed for comparison to various more traditional
assessment tasks.
In comparison, this study also uses the interpretivism paradigm as outlined in section
2.1. Apart from some initial exercises confined to the laboratories in the early
development stages most of the activities supporting the research occurred in the real
world of teaching, during the tutorials and as part of the revision offered to the
students. These later investigations were dependent on the interaction of the
participants as part of their daily activities, both in class and at home, as a scheduled
part of the curriculum. As a result much of the data collected was captured during
dialogues that exist between instructor and student. In many cases the recorded
feedback was as a result of the subject’s appraisal system run as part of the university
course quality management process, where the students are given the opportunity to
comment on the assessment mechanisms used during the semester. These survey
results produce rich qualitative data representing the feelings, values, perceptions,
cultural and social attitudes of the individual, which require interpretation and
classification. In such cases this analysis assists in understanding the impact of the
research on the smaller sub groups of the cohort and not the population as a whole. It
is this opportunity to analyse the way the individuals learn and react that provides a
deeper understanding of the impact assessment with confidence measurement may
have on the learning path. It recognises that while there is a need to understand how
the greater population behaves during the learning process it is prevalent that we also
consider the finer characteristics and eccentricities of the individual and those of the
smaller groups that they might typify. The research sub questions predominately
addressed by this approach are outlined in Table 2-2.
27
Sub Questions addressed using Interpretivism Research Paradigm
Q1A: “What are the student’s and instructor’s attitudes and perceptions of
assessment with confidence when used for formative assessment?”
Q1C: “Does the use of assessment with confidence measurement provide
additional valuable feedback to the instructor when used for
formative assessment?”
Q2A: ”What are the students’ and instructors’ attitudes and perceptions of
assessment with confidence measurement when used for summative
assessment?”
Q2D: “Does the use of assessment with confidence measurement provide
additional valuable feedback to the instructor when used for
summative assessment?”
Research Q3:
What are the design requirements for developing an interactive
assessment with confidence measurement to ensure that instructors and
students are able to achieve maximum benefit from its application?
Table 2-2:Research Questions Addressed by the Interpretivism Research
Paradigm.
The sub questions identified in Table 2-2 above required a particular research approach
and appropriate data analysis. The empirical data gathered from the surveys, dialogues
and informal interviews are dealt with in various ways including classification and
cluster analysis.
28
2.4 HCI Approach to Problem Solving
Like many research activities embedded in a real world environment, this research finds
itself planted firmly between the needs of finding a particular solution to an identified
problem and performing research in the field, in this case as part of the educational
process. A formal structure has been adopted, which satisfies the requirements of both.
It is for this reason that the methodologies presented here have been chosen to address
the two areas, one for the problem space and the other for the research. It is important
that there is a distinction between the two as they at times run in parallel, and
occasionally crossing paths within the research process.
This study focuses on an approach to learning that is supported by the use of an
assessment incorporating a confidence measurement interactive tool. Consequently,
there has been a need to develop, test and implement an interactive Web-based tool
permitting the students to self assess their knowledge at their convenience
(Zimmerman, 2008). This tool is designed to support the student in self-regulation and
reflection. In general, interfaces use many areas of technology, including multimedia
design and Internet technologies. Critical to the success of any interface system project
is the involvement of the users in the design, development, testing and implementation
phases (Sharp, Rogers, & Preece, 2007). In developing an interactive online assessment
system there is a need to provide the participant with a rewarding, enjoyable and
beneficial experience (Harrison & Petrie, 2007). The HCI problem solving
methodology, like many others, supports an iterative user centred approach,
commencing with the identification of the problem, followed by the formulation of the
goals to address the problem at hand then the development of the solution through an
iterative process involving the users. Once the primary goal has been satisfactorily
defined it is broken down into sub-goals that will assist in achieving the best possible
outcome. The HCI discipline not only recognises this process; it ventures to capture it
as part of the needs analysis activity. This is a fundamental component of the HCI
methodology. The User Centered Design (UCD) approach, as Lindström and Malmsten
promote is used to achieve this (Lindström & Malmsten, 2008). Sharp et al. (2007)
consider the consequences of this approach to have a greater chance of producing a
29
result that is designed to meet the needs of the users, to make the most of human skill
and judgment and one that produces a solution directly relevant to the work, supporting
rather than constraining(Sharp et al., 2007). UCD was initially drawn from successful
Scandinavian experience in the 1970s (Daniel, O’Brien, & Sarkar, 2009;
Schneiderman, 1997) and it has developed into variations of application (Hussain et al.,
2008; Righi & James, 2007), being Participatory Design and Contextual Interaction
design (Sharp et al., 2007).
Once the research problem and the aim are formulated the process of finding a solution
following the HCI User Centered Design methodology can be initiated, as outlined in
the ISO 13407 Human-Centered Design Lifecycle model (Bevan, 2009; Sharp et al.,
2007) and the more complex Usability Engineering Lifecycle, as proposed by
(Mayhew, 1999; Seffah & Metzker, 2008).
2.5 Summary of this Research Structure
This research adopts the general methodologies of positivism and interpretivism and
uses the principles of HCI user centered iterative problem solving approach. In this
chapter the research problem and situation in the real world environment identified in
Chapter 1 is addressed by the formulation of the three main research questions. From
these main research questions a series of seven sub questions are produced in order to
deal with the research problem in context, leading to the establishment of the research
framework where the appropriate research paradigm is identified for each cluster of
questions. The research framework ties in the questions to the research activities to
ensure that the generated quantitative and qualitative data is relevant to the study and
analysed by the most appropriate method.
The following chapter will consider variations to multiple choice assessment strategies
as practiced by other rigorous researchers, comparing their approaches and the
educational theory behind their choices. It then considers assessment with confidence
measurement as a solution to the identified concern that traditional assessment often
fails to provide an accurate representation of students’ knowledge, which is detrimental
to both the student and instructor.
30
CHAPTER 3 VARIATIONS OF NON-CONVENTIONAL MCQ ASSESSMENT STRATEGIES FOR LEARNING
This chapter will consider the supporting literature that places this research firmly in
educational assessment context determined in the scope in Chapter 1. An investigation
is that discusses the work of those who have previously been active in the area of
developing and implementing various self-assessment strategies in an attempt to
address the identified areas of concern. Furthermore, this chapter will consider the
underlying arguments supporting the use of assessment strategies that incorporate
confidence measurement and the fundamental principles for the application of
interactive self-assessment systems as part of learning.
31
3.1 Learning Theories and Learning Styles
It is important to briefly consider learning theories with the various learning styles and
a student’s propensity towards them, as the development of alternative assessment
strategies requires consideration to these learning styles, acknowledging the role that
they play in the educational process. Morris et al. purport that for assessment to be
“fair” it should be appealing to the students learning styles (Morris, Porter, & Griffiths,
2004), while Sternberg et al. consider that assessment options should be determined by
learning styles (Sternberg, 1988).
3.1.1 Learning Theories
It is necessary to consider learning theories to understand the learning process before
discussing the various learning styles. Morris et al. (2004) indentify the three main
learning theories (Morris et al., 2004) as:
• Constructivist;
• Cognitive;
• Behavourist;
The Constructivists view learning as contextual with preference to the practical
applications, encouraging pedagogy with consideration to interactive learning in a
cooperative learning environment of instructors and students (Martin, 2008; Piaget &
Duckworth, 1970). Constructivism promotes students’ meaningful experiences in
learning in the real world with provision of tasks designed to engage the student at an
individual level, offering opportunity for reflection.
The Cognitive approach considers the information processing position of learning
where motivation, memory and reflection permit the connectivity between higher and
lower levels of learning (Novak & Cañas, 2008). The cognitive approach is reliant on
pedagogy where models are used with the sequencing of content to maximise attention
and take advantage of existing cognitive structures.
32
The Behavourists view learning as the change in behaviour with pedagogy providing a
focus with defined outcomes with opportunities for self-testing and interactive feedback
on the student’s achievement (Shoben Jr, 2009).
The pedagogies previously discussed have a reliance on extrinsic and intrinsic
motivational factors. Morris et al. (2004) claim that motivation is fundamental to
learning (Morris et al., 2004), both extrinsically (Trotter, 2006), usually as a result of
the instructor’s need to generate grades, or intrinsically being derived from within the
learner as they strive for self-satisfaction and personal reward. Keller (2008) identifies
five principles of learning motivation in order to overcome obstacles and assist towards
the accomplishment of their goals (Keller, 2008), being:
• When the learner curiosity is aroused due to a perceived gap in their knowledge.
• When the knowledge to be learned is of value to them.
• When the learner believes they can succeed in mastering the learned task.
• When the learner anticipates and experiences satisfying outcomes.
• When the learner employs “Volitional” (self regulatory) strategies to protect their
intentions.
Morris et al. further consider intrinsic motivation with the inclusion of self-monitoring
and control to be the more beneficial in eliciting deeper learning (Morris et al., 2004).
They further encourage the development of models for motivation in learning that
promote capturing the student’s attention by offering them relevance, supporting the
development of a student’s self-confidence and the promotion of a sense of
achievement and satisfaction through interactive feedback. Morris et al. (2004) argue
that strategies for learning must include components that develop and enhance meta-
cognitive skills (Morris et al., 2004).
3.1.2 Learning Styles
Coffield, Moseley, Hall and Ecclestone (2004) and Abdulwahed, Nagy and Blanchard
(2008) consider Kolb’s experiential learning theory (Kolb, 1984), in which he devised
the Learning Style Inventory, as one of the most influential models of learning styles
(Abdulwahed, Nagy, & Blanchard, 2008; Coffield, Moseley, Hall, & Ecclestone, 2004).
It should be noted that Kolb does not himself consider a student’s preference to a
33
particular learning style to be a fixed trait but a differential preference for learning,
which can vary slightly from situation to situation (Kolb, 1999). Kolb’s experiential
learning theory as shown in Figure 3-1, is based on a four stage learning cycle that must
be present for learning to occur and consists of :
• Concrete experience (feeling): Learning from specific experiences and relating
to people.
• Reflective observation (watching): Observing and viewing the environment
from different perspectives.
• Abstract conceptualisation (thinking): Logical analysis of ideas and an
intellectual understanding of a situation.
• Active experimentation (doing): Implementing events through action including
risk-taking.
1
Figure 3-1: Kolb's (1984) Learning Style Model.
Kolb then proposed that learning occurs across two intersecting continua of:
• Processing Continuum: Our approach to a task, such as preferring to learn by
doing or watching.
• Perception Continuum: Our emotional response, such as preferring to learn by
thinking or feeling.
34
Kolb (1984) identified students to have preferences towards pairs of the phases of
learning that lie at either end of the two continua, classifying learners as being either:
• Divergers: Who prefer the concrete experience and reflection on that experience;
• Assimilators: Who prefer reflection on and conceptualisation of the experience;
• Convergers: Who conceptualise the experience and then experiment actively with
the idea;
• Accommodators: Who prefer concrete experience and the opportunity to
experiment with ideas formed by the experience.
Nieweg (2000) and Kolb (1984) consider the opportunity to work through the four
phases to offer the greatest learning experience, where tasks designed to expose
students to all phases encourage stimulation to deeper learning (Kolb, 1984; Nieweg,
2000).
Assessment strategies are required to support learning process, providing the
opportunity for a student to self-evaluate during the learning to confirm their level of
knowledge by their performance in the various activities, as demonstrated by Keller
(2008) in his fifth motivational strategy of volitional (self regulatory) strategies to
protect their intentions (Keller, 2008). These experiences (identified in the learning
theories) are essential to facilitate learning, providing both extrinsic and intrinsic
motivation, thereby leading to deeper learning. Recognition of students’ propensity to
learning styles permits assessment strategy development to create assessment
experiences, or combinations of experiences, that support the four phases of learning,
with consideration given to the Kolb’s (1984) preference to learning styles as outlined
(Nieweg, 2000). In Section 3.11 we will revisit these learning styles and the importance
of feedback and their influence on the learning through Hede’s (2002) Integrated Model
of Multimedia Effects on Learning (Hede, 2002).
3.2 The Value of Feedback in the Learning Process
Hattie and Timperley (2007) clearly identifies Feedback as the single most influential
contributor to the student’s progress (Hattie & Timperley, 2007). His thorough
35
investigations into the influences of student improvement rely on the application of
meta-analysis to the data. It is through this research that he recognises the dominant
influence on a student’s improvement to be in the domain of the teacher. He then
progressively drills down to the services that are supplied by a teacher in performing
their professional duties, such as Instructional Quality, Direct Instruction and Peer
Tutoring, revealing that feedback is at the top of the list. Hounsell, McCune, Hounsell
and Litjens (2008) cite the findings of the 2007 Scottish National Student Survey that
indicates that there is less satisfaction with feedback than any other aspect of teaching
and that to sustain the quality of student learning there is a need to rethink how best to
provide feedback within the changing landscape of education (Hounsell, McCune,
Hounsell, & Litjens, 2008).
In Chapter 1, the need for feedback to be as timely as possible is highlighted, with the
ever-increasing demand on the instructor’s time. Unintentional delay in feedback is to
the disadvantage to both the student and the instructor. Hede (2002) includes feedback
as a significant contributor to the learning process in his model (Hede, 2002), to be
considered in detail in Section 3.11.
3.3 Formative and Summative Assessment as Part of the Learning
Path
The position of assessment in education is well recognised, playing a critical role in the
educational process. The assessment options available to instructors are many and
varied and will be discussed in detail later in this chapter, however the general need to
develop, test and implement assessment systems that can play an important role in the
learning process of the student must be initially recognised, as it plays a critical role in
the path that this research follows.
In Chapter 1, the criteria for effective formative and summative assessment were
discussed, indicating what we look for in assessment strategies and what is considered
to be good practice. The importance of timely, relevant feedback must be emphasised
for formative assessment tasks influencing the student in their learning direction and
assisting the instructor in delivery. Summative assessment is reliant on the same
36
formative assessment attributes, as it contains aspects of formative assessment while
being required to produce reliable, valid results reflecting student’s achievement.
MCQs for many years have played a role in formative and summative assessment
strategies, offering both good and bad experiences, depending on their suitability and
construction. This can be attributed to their ease of use, effortless conversion to the new
encompassing technology and the perceived ability to assess large numbers of students
on broad areas of knowledge efficiently. Whether they comply with the criteria for
good assessment is dependent on the construction of the questions, the suitability to the
area and depth of knowledge being assessed and the purpose of the assessment task,
being either formative or summative.
Educators are aware of the value of self-esteem and the important role that it plays in
the success or failure of a student in the learning process and should always be
considered when formulating assessment strategies (Torrance, 2007b).
Black and Wiliam (1998,2006,2009) use the term “Assessment” as referring to the
group of activities undertaken by both teachers and students in evaluation of knowledge
learnt by providing both grades and feedback for the purpose of modifying teaching and
learning paths (Black & Wiliam, 1998, 2006, 2009). Furthermore, they consider that
assessment becomes formative assessment when the evidence is used to adapt the
teaching in order to meet the needs of the students. On the other hand, it is generally
accepted that summative assessment is an assessment strategy that has the primary
objective of supplying a grade for the student.
Black and Wiliam (1998,2006) refer to the learning environment as being a Black Box,
with input from students, teachers, parents and resources, with the consequential
outputs being students with advanced educational standing (Black & Wiliam, 1998,
2006). It is within this Black Box that they consider the role of Formative Assessment
critical in the transformation of the students’ educational standing, contributing to the
raising of the national standards. Black and William’s (2006) conceptualisation of
formative assessment (Black & Wiliam, 2006) has the following five strategies:
1. Engineering effective classroom discussion, questions, and learning
tasks that elicit evidence of learning;
2. Providing feedback that moves learners forward;
37
3. Clarifying and sharing learning intentions and criteria for success;
4. Activating students as owners of their own learning;
5. Activating students as instructional resources for one another.
Carless (2007) cites Black and Wiliam’s (1998) classification of poor formative
assessment where the feedback is misunderstood or not acted upon, claiming it to be
formative in purpose but not in function (Carless, 2007).
Black and Wiliam (1998) state that students too often are content to get by. Perrenoud
(1998) suggests that a solution would be for the instructors to revisit the teaching
contracts in order to counteract the student’s acquired habits with the inclusion of
formative assessment as a key component of the learning process (Ayala et al., 2008;
Perrenoud, 1998). Morris et al. (2004) state that the instructors continually ask how
they will I know when the students understand the concepts (Morris et al., 2004).
Not all educators are supportive of the common assessment strategies extensively used.
Krumboltz and Christine (1999) and Torrance and Coultas (2009) voice concerns about
the emphases placed on summative assessment and the consequential negative
influence that it has on the learning path of a student (Krumboltz & Christine, 1999;
Torrance & Coultas, 2009). Krumboltz and Christine (1999) feel that assessment with
the primary objective of grading can actually misdirect and inhibit student learning and
that the grading process encourages the teacher to focus on the negative, laying any
fault and failure at the feet of the student. They further consider competitive grading to
de-emphasise learning in favour of judging, displacing learning to a secondary goal of
education. Taras (2009) suggests the solution requires a shift in the paradigm, basing
their definitions of formative and summative assessment on processes of assessment
and not on the functions of assessment. They justify this by stating that the functions
remain as a basic epistemological premise of assessment (Taras, 2009).
38
3.4 Assessment as a Means of Shifting the Responsibility of Learning
to the Student
Many instructors are becoming increasingly aware of the benefits of shifting the
responsibilities of learning into the student’s hands (Chatti, Jarke, & Frosch- Wilke,
2007; Krätzig & Arbuthnott, 2009). As the Guidelines for the Teaching of Educational
Psychology in Teacher Education Programs (American Psychological Association
Work Group of the Board of Educational Affairs, 1997) suggest, educators addressing
the issues of school dropouts, low-levels of academic achievement, and other indicators
of school failure are recommending more learner centered models of schooling. They
also recognise the important concept of Meta-cognitive learning, which concentrates on
thinking about their own thinking (Krätzig & Arbuthnott, 2009). This includes the
critical components of self-awareness, self-inquiry, self-monitoring and self-regulation,
promoting higher levels of commitment, persistence and involvement in the learning
process. The guidelines further recommend that the curriculum is designed to include
components that practice the meta-cognitive strategies of reflective self-awareness and
goal setting. In addition they also state that assessment tasks should foster self-
appraisal and self-regulated learning.
Resulting amended instructional paths instigated from formative assessment are critical
for the students’ success in navigating the learning path. More and more instructors
consider instruction and formative assessment not to be just strongly linked but
inseparable components in the learning experience (Farrell & Leung, 2002a; Shavelson
et al., 2008).
Doebbert (1999) emphasises the need for the student to develop skills in managing and
controlling his/her learning with the utilisation of technology assisting in the process as
they negotiate their educational path. It is important that systems that appear to provide
a multitude of benefits to the students and instructors should be pursued with vigor
(Farrell & Leung, 2004b), giving the opportunity to place the control of learning in the
hands of the student (Karpicke, Butler, & Roediger III, 2009). In particular this can be
supported and achieved by the use of the formative and summative methods of
assessment.
39
Davidoff (1995) identifies a concern with the need for assessment to be designed to
recognise incomplete or partial knowledge and also permit the student to hedge. This is
due to medical knowledge being incomplete, sometimes ambiguous and conflicting.
Components of e-learning are used by many educational institutions, contributing
significantly to the learning experience. The embracing of the Internet by the many in
all aspects of our educational, social, and working life has ensured that this new
paradigm will remain a component of our lives. Our general reliance of this technology
has resulted in the embedding of many services that it offers into our routine activities.
There are very few modern daily events that are not reliant in some way on the
technology that encompasses this phenomenon. Education is one such activity that has
greatly benefited from the Internet, offering diversity and flexibility. The educational
process is no longer confined to the classroom but is now available to the participant on
a global level at all times of the day and night. The extensive use of multimedia
applications has enhanced the learning experience by creating a myriad of flexible
learning material suitable for various preferred individual learning styles, in particular
increasing the accessibility of learning materials to those who would have limited
access if reliant on the traditional format.
3.5 Assessment Using New Technology
In view of the importance of the contribution of e-learning components to education
(Howlett et al., 2009) and the fundamental role played by assessment in the learning
process, it is essential that we explore technologies and approaches that will improve
assessment effectiveness and its impact on the learning process. The nature of e-
learning components often removes the personal interactivity held within the classroom
situation. The challenge for educators is to use e-learning components to enhance the
online learning experience with techniques that provide rich, personalised, timely
feedback to the individual student.
Marshall University (1999) provided a thorough comparative study of online course
delivery software, identifying the active components of each package and the depth of
their application (Marshall University, 1999). Not surprisingly, online assessment using
40
multiple-choice questions format appears in all of the recommended packages such as
Blackboard, WebCT, TopClass, Web Course in a Box, Toolbook. They further state
that only five of the identified ten packages that offer formative assessment also
provide an effective feedback mechanism that directs the student to tutorial paths as a
consequence of formative assessment.
3.6 Concerns with Computer Assisted Assessment
Some educators fear that the technology is artificially driving the usage of Computer
Assisted Assessment (CAA). It is considered that instructors are lured towards an
online assessment option, with the promises of faster turn around, greater feedback and
automated student assessment recording. Hartley, Strudler and Schraw (2008)
expressed concerns around the integrity and quality of computer aided assessment tools
that require extra security restrictions for logins (Hartley, Strudler, & Schraw, 2008).
Popham (2008) warns about the commercial test developers who are prepared to attach
themselves to developers who have enthusiasm for formative assessment (Popham,
2008), when in fact their assessments often do not offer any financial benefit. There is
no doubt that the benefits from the use of a good CAA package are significant and
should be enthusiastically pursued. In some cases, the use of a CAA is not only valid,
but also preferred, especially when test material complies with the requirements of the
CAA system of choice. The concern that should be raised is when the content does not
comply with the requirements of the CAA. In many cases CAA packages are being
used as a summative assessment test where the material to be assessed does not fit into
the functionality of the package, as Farrell and Leung (2004b) demonstrated when
investigating the use of the Blackboard MCQ testing facility for questions comparing
lengthy SQL scripts (Farrell & Leung, 2004b), resulting in the students voicing
dissatisfaction and complaining about the testing procedure. Greenfield (2009)
recommends education to have a balanced media diet using each technology's specific
strengths in order to develop a complete profile of the student’s cognitive skills
(Greenfield, 2009).
41
3.7 Assessment Options Available
Educational institutions use a variety of assessment options to grade their students and
assess the effectiveness and validity of subject content. A critical component of sound
educational program is to assess the learning outcomes throughout the duration of the
course, as both a means of giving timely feedback and as a mechanism to grade the
students, given that each kind of assessment has its purpose (Kennedy, Chan, Fok, &
Yu, 2008).
An issue faced by educators is; “What methods of assessment should they be using and
what would be the appropriate mix to maximise the feedback and evaluation process?”
Schuwirth and Van Der Vleuten (2006) consider a well-designed assessment program
to utilize different types of questions that are appropriate for the content being assessed
(Schuwirth & Van Der Vleuten, 2006). Torrance (2008) argues assessment should
move from assessment of learning to assessment for learning. Tomanek, Talanquer and
Novodvorsky (2008) identify two determining factors when choosing assessment
strategies, the first being the characteristics of the task, in that the testing is aligned to
relate to the qualities of the task regardless of the learning environment, and the second
the characteristics of students or the curriculum, relating to the learning environment in
which an assessment task would be implemented, such as students' abilities to complete
the task (Tomanek, Talanquer, & Novodvorsky, 2008).
The options presently available to the instructors include Multiple-choice Questions
(MCQ) and the suite of Constructed Responses (CR) usually comprising of short
answer questions, longer problem solving questions, case study reports, presentations
and other equally effective and proven methods. In the majority of cases the final grade
is calculated by combining each separate grade from assessment tasks completed during
the subject. The utilisation of multiple assessment methods recognises the need to
permit students to demonstrate their knowledge in various methods throughout their
learning experience.
42
3.8 Multiple-choice Questions
Multiple-choice questions (MCQs) are frequently used in traditional education forums
for both a formative and summative assessment (Tarrant et al., 2009). Swartz (2006)
considers the extensive acceptance and use of MCQs as an assessment tool can be
attributed to their ability to assess broad fields of learning in a compact system while
being quick to assess with inherent objectivity and provide good feedback to the
students at minimal cost (Swartz, 2006). Additionally, their popularity is increased by
the ability to reuse them over periods of time, as recent constructed questions often
contribute to question banks available to the instructors.
MCQs are highly regarded by instructors (Bacon, 2003) and consequently used
extensively, with global experience in their construction (Libarkin, 2008; Schuwirth &
Van Der Vleuten, 2006), and easy adaptation to the computer assessment environment
and increased ease of application. There are two roles that MCQs play in the balanced
educational program. Firstly, MCQs are used extensively as a means of formative
assessment (self-assessment), where the feedback influences the direction of the
students as they journey along their learning path. MCQs are a popular self-assessment
option being readily available to the students due to the advancement of technology that
now supports its functions. Web-based MCQ self-assessment packages permit the
student to self assess their knowledge at any time convenient to them, provide instant
feedback and in many cases recommend change in directions to their learning path.
Secondly, MCQs are also extensively used for summative assessment for the grading of
students, being strategically placed in the exams with various mark allocations directly
contributing to the student’s final grade. Their popularity can be attributed to their
ability to offer equivalent reliability and validity in a shorter amount of time as they
have an economy of scale that does not exist in constructed-response (Bacon, 2003). In
addition they are considered to have the ability to test many topic areas in a relatively
shorter time (Ventouras, Triantis, Tsiakas, & Stergiopoulos, 2010; Wilson & Case,
1993).
Bacon (2003) also identifies one advantage of using MCQs, the “Objective” marking
(Swartz, 2006), as a method of avoiding the deficiency of reliability of essay tests, as he
43
cites previous work of Ashburn (1938), where subjective marking of short essay
answers yielded significant difference in grades when remarked (Ashburn, 1938).
Schuwirth and Van Der Vleuten (2006) and later Govaerts, Schuwirth and Muijtjens.
Emphasise (2007) voice growing dissatisfaction with the MCQ format as they rely on
recognition of the correct answers (Govaerts, C., Schuwirth, & Muijtjens, 2007;
Schuwirth & Van Der Vleuten, 2006), while some see MCQs as only demonstrating
knowledge of isolated facts (Wilson & Case, 1993). Wilson and Case (1993) also state
that they fear this forces undue emphasis on recall and will stimulate students to learn
and rehearse in a like mode (Swartz, 2006). Schuwirth and Van Der Vleuten (2003,
2006) recommend variation in the question format due to the likelihood that students
will prepare depending on the types of questions used, as their medical students often
try to identify what the assessment is so they can prepare strategically, instead of
studying to become better doctors. Bacon (2003) discusses at length the concerns that
the MCQ format is too simple and does not assess the complex levels of knowledge, in
particular the higher levels of Bloom’s (1956) taxonomy of educational objectives
(Knowledge, Comprehension, Application, Analysis, Synthesis, Evaluation) (Starr,
Manaris, & Stalvey, 2008). Bacon (2003) does recognise the examples of MCQs in
Bloom’s (1956) work that demonstrate the application of MCQ testing designed to
assess outcomes at every level, supported by the work of Palmer and Devitt (2007)
where they acknowledge that in the majority of cases MCQ’s only assess recall,
however they purport that well constructed MCQ questions can assess the higher order
cognitive skills (Palmer & Devitt, 2007). They stipulate that higher order MCQ
questions can only be successfully implemented if the questions are peer reviewed,
encouraging the critique of others to contribute to their construction. Bacon (2003) and
Palmer and Devitt (2007) also recognised that this level of MCQ is difficult to
construct. Schuwirth and Van Der Vleuten (2003) in their research argue that the
question format is of limited importance compared to the construction of the question,
as the success of the assessment strategy is primarily reliant on question construction to
be correct.
Ng and Chan (2009) express the shortcomings of conventional MCQ tests where
conventional multiple-choice test method does not capture or consider partial
44
knowledge, supported by Swartz (2006) who consider conventional MCQ testing to
offer inferior discrimination in the levels of knowledge.
3.9 The Suitability of MCQ Tests to the New Technology
The ease of the adoption of the MCQ format to the hypermedia environment was swift
due to the appeal of the ability to produce fully integrated, automated tests that instantly
supply feedback to the participant, together with possible suggested directions for
further study. Furthermore, the increased ability to monitor the student’s progress
through this technology contributes to the student management structure. The result is
that the use of MCQs in general have grown in the e-learning domain, particularly as a
formative assessment tool.
There are many e-learning add-on packages being used by educators today permitting
the construction of MCQs with various formats. On completion of a typical MCQ test
the resulting score with the correct answer identified is usually given as the feedback.
In some cases the incorrect answers are also identified with brief explanations, thus
enabling direction for the student to the appropriate subject for further study. This
method is effective but is dependent on the student answering the questions honestly,
without guessing.
3.10 Previous Work on Innovative Approaches to MCQ Assessment
The previous discussion has identified the value of assessment and the role that it has to
play in education. It also discussed the attributes of good assessment practice,
acknowledging the criteria required to be met for the creation of a good assessment
item. There is a need for assessment to be developed and refined to fill the gap as
technology drives conventional assessment strategies forward. In some cases, MCQs
have shifted away from their original purpose of broad assessment for large numbers of
students to the more complex assessment of higher levels of knowledge. This increased
level of responsibility can often show the flaws of MCQ testing design, especially in
the construction of the test questions. There is anecdotal evidence that the better
45
students tend to resent MCQ testing as they consider the process fails to distinguish
between the higher achieving students and the other less competent.
This section of the chapter reports on the previous work of others in the field that had a
pivotal influence on this research, its direction of this research and the functionality of
the developed facilitating tool. The fundamental concepts underpinning the foundations
of assessment with confidence measurement tool are presented, for instance, the
concept of negative penalty scores for incorrect answers and scoring techniques that
recognise partial knowledge.
3.10.1 The Need for Innovative Scoring for Assessment
The previous discussion has identified the shortcomings of traditional MCQ assessment
that encourage guessing, fails to recognise partial knowledge and often miscalibrates
confidence. Chapter 1 identified the criteria that constitute good assessment practice; a
contributing attribute of good assessment is the adoption of an appropriate scoring
technique. The scoring of MCQs has long been a point of discussion with serious
consideration given to various scoring models. Consequently, scoring models designed
for the purpose of eliminating guessing that use complex scoring algorithms that aim to
produce a more precise reflection of the student’s understanding of the underlying
concepts have been developed and investigated.
Here we will discuss in general some of the influential previous work. It should be
noted that the following examples vary in structure, as some of them are based on
MCQs with four options and others are designed for MCQs with three options. In one
case the assessment is based on singular true or false questions. The scoring technique
discussed is in context to the design of the application and should be considered as
such. Educators, often debate the optimal number of answers, mainly divided between
four and three (Ng & Chan, 2009). In many cases the supporting mathematics can
easily be extended to cater for any variation, depending on the instructor’s preference
for their cohort of students. There will be a need to revisit some of them in depth in a
Chapter 4 when considering the scoring mechanism adopted for this research.
46
3.10.2 MCQs Designed to Eliminate Guessing
MCQs have traditionally required the student to identify the correct response from a list
of possible answers with the resulting score based on the criteria of being correct or
incorrect. Ng and Chan (2009) identify two categories of correctly answered questions,
the first being the number of questions where the student actually knows the answer
and the second being the number of questions where they have correctly guessed. Ng
and Chan (2009) in their work comparing different MCQ scoring techniques using
signal detection theory (De Carlo, 2005) identified three classifications of scoring
variations relevant to this study: Liberal Multiple-choice (Bradbard, Parker, & Stone,
2004; Jennings & Bush, 2006), permitting the student to choose more than one correct
answer, Elimination Testing, permitting the student to select answers which they
consider wrong and Confidence marking, permitting students to assign a level of
confidence or allocate an order of preference (Alnabhan, 2002; Swartz, 2006) to their
choice.
Pollard (1985, 1986, 1993), Hobson and Ghoshal (1996), Bush (2001), Jennings and
Bush (2006) and Frandsen and Schwartzbach (2006) all produced alternative MCQ
scoring techniques designed to minimise random guessing based on a reward and
penalty structured scoring system (Frandsen & Schwartzbach, 2006; Hobson &
Ghoshal, 1996; Pollard, 1985, 1986, 1993; Pollard & Clark, 1989).
In particular Pollard (1985, 1986, 1993) designed and implemented a number of scoring
mechanisms to address guessing. He achieved this by allocating a positive score for
each correct answer and a negative score for each incorrect answer. Pollard and Clark
(1989) provided various options for the penalising of students who incorrectly
identified an incorrect option as correct and a correct option as incorrect. His approach
to assessment produced a series of penalties that were proportionally less than the
rewards for correct responses. Consequently, it only depleted, not negated, the overall
score if a correct option was identified. Educational institutions, such as the Australian
Mathematical Association National Mathematics Quiz, have used Pollard (1985, 1986,
1993) scoring to eliminate guessing, however, to be effective, a full understanding of
the calculations involved in the grading schedule is required, which is often beyond the
comprehension of the average student. Pollard’s (1985, 1986, 1993) scoring mechanism
47
does not use confidence as a knowledge indicator, however it does address the area of
encouraging guessing. While this research acknowledges the merit in this system, it
also has concerns with the level of complexity in the scoring calculations. The student
must fully understand the consequences of his/her action when doing a test, and
therefore there is a need for them to comprehend the method of scoring so they realize
the consequential outcomes. Pollard’s system relies on the combination of boxes
ticked, with many variations that have to be considered, resulting in a grade calculated
by a complex algorithm. A more rigorous explanation of Pollard’s (1985, 1986, 1993)
scoring system is provided in Chapter 4.
3.10.3 Innovative MCQ Assessment with Confidence Measurement
As previously discussed, there is a need to develop assessment strategies to address
both the interference from guessing that Ng & Chan (2009) and De Carlo (2005)
referred to as “Noise” and the inability of the conventional MCQ format to recognise
partial knowledge. Swartz (2006) states that the introduction of confidence
measurement reduces the effect from guessing and provides additional diagnostic
feedback beneficial to the learning process. Ng & Chan (2009) based their work on the
findings of Alnabahn (2002) and Swartz (2006) that confidence measurement (partial
ordering) produced the highest validity measurement and offered advantages in
measurement accuracy.
The following discussion introduces some of the activities of the pioneers of
assessment with confidence measurement who developed interactive systems designed
to eliminate the gains acquired from guessing, encourage more critical, honest self-
assessment and promote declaration of little or no knowledge. Further in-depth
discussion will occur in Chapter 4.
Brown and Shufford (1973) in the discipline of health education produced an MCQ
calibrated scoring system encouraging honesty, designed to permit the student to
register their level of confidence in choosing an answer (Brown & Shufford, 1973 ).
The scoring system severely penalised the participant if they registered high confidence
in an incorrect answer and equally rewarded high confidence in a correct choice. The
primary objective of this scoring system is to identify students that are either over-
48
confident or under confident. In doing this they consider there to be two classes to
benefit, the first, being the student who has a better understanding of their level of
knowledge and second the student who can develop an appreciation of numerical
probability and use it to express levels of uncertainty. Furthermore, Feltz (2007)
considers perception of one’s ability or self-confidence to be the central mediating
construct in striving for achievement (Feltz, 2007).
Paul (1994) and later Klinger (1997) used a computer based interactive system of
scoring for an MCQ format with three options and only a single correct answer, where
the student nominated a position on lines joining the apexes in a triangular shape
(Klinger, 1997; Paul, 1994). Each of the apexes represented the three optional answers,
A, B or C. The triangles were proportionally divided the lines joining the apexes into
segments encouraging the student to nominate their level of confidence. The grade was
then scored according to a logarithmic scale.
Of particular interest is Paul’s (1994) Web-based interactive response system called the
Computer Based Alternative Assessment (CBAA). The CBAA requires the student to
choose an option from three possible answers, A, B or C, registering a level of
confidence in their answer. With this system the student places the cursor within a grid
area aimed to reflect the confidence of their choice. The student must negotiate the area
with a mouse and click on the region that they feel portrays their confidence. Each of
the three options, A, B and C, are located at the vertices of the triangle and the closer
the student positions the cursor to the vertex the more confident they are that it is the
correct answer. A corresponding score is then calculated by considering the position of
the registered level of confidence. The system is presented to the student in the format
shown in Figure 3-2. This research has some initial concerns about Paul’s (1994)
CBAA to be discussed in Section 4.2.1.
49
Diagram A: CBAA Triangle showing
answers options at each apex.
Diagram B: CBAA Triangle showing
strength of belief P(A) that A is correct
associated with each region
Figure 3-2 Confidence Measuring Template, Paul (1994).
The more recent, extensive work of Gardner-Medwin and Gahans (2003) and Gardner-
Medwin (2006) has revealed interesting outcomes having significant influence on the
direction of this research. At this time a brief explanation of Gardner-Medwin’s and
Gahan’s (2003) approach and subsequent scoring method will be given, with a full
description supplied in Section 4. 2.1(Gardner-Medwin, 2006; Gardner-Medwin &
Gahan, 2003).
Gardner-Medwin and Gahan’s (2003) assessment strategy uses a scoring technique that
provides a series of grades that both rewards the student for correct answers and
penalise them for incorrect answers. His scoring technique has three options, permitting
the participant to register 3 distinct levels of confidence, high, moderate or low. The
scoring is applied to true and false question formats but has relevance here given that
traditional MCQs have clusters of 3 or 4 answers, where each rating is true or false for
that stem. Gardner-Medwin’s and Gahan’s (2003) scoring system has some
distinguishing features, in that it rewards the student, who selects the correct answer
proportionally to the confidence registered, that is a grade of 3 for high confidence
registered as C=3, a grade of 2 for moderate confidence registered as C=2 and a grade
of 1 for low confidence registered as C=1. Importantly in contrast, it severely penalises
50
the student who registers a high level of confidence (C=3) for an incorrect answer with
a score of -6, moderately penalises the student who registers moderate confidence
(C=2) for an incorrect answer with a score of -2 and does not penalise the student who
registers low confidence (C=1) for an incorrect answer by giving them 0. In summary
Gardner-Medwin’s (2003) scoring reward for a correct choice stays proportional for all
of the options (3, 2, 1) while the penalty score does not (0,-2,-6). Gardner-Medwin and
Gahan’s (2003) system is forgiving to a student who admits that they have very little
confidence in their answer (C=1) by not penalising them at all (0 score). His arguments
for this are complex and lengthy but can be summarised by his own words, being that
the scoring is properly motivating and that lucky guesses are not the same as
knowledge.
Davies (2005) MCQ scoring for 4 optional answers, is in direct contrast to the scoring
regime of Gardner-Medwin (2006), as he states that the students who demonstrate a
high level of confidence in a correct choice should receive a greater reward,
recommending penalising a student who demonstrated high levels of confidence for
incorrect answers disproportionally less than the student who demonstrated a high level
of confidence in a correct answer (Davies, 2005). Davies also forgives the individual
who declares low confidence in an incorrect answer by not penalising them.
3.11 Interactivity in Learning
The previous discussion identified the learning theories and the need for reflection and
self-assessment to support the learning process. It also acknowledges the extrinsic and
intrinsic motivational factors that drive the learner through the learning experience. In
particular these extrinsic and intrinsic motivational aspects contribute to the learners
propensity towards a specific learning style to assimilate knowledge. There is a need
for this research to formalize this relationship into a learning model where the
components of e-learning activities are integrated into the taxonomy of learning with
the primary objective of acquiring life long knowledge. For this reason this research
will couple together the Learning Theories and Kolb’s (1984) Learning Styles with the
51
contributions of multimedia to learning, using Hede’s (2002) Integrated Model of
Multimedia Effects on Learning.
Online assessment tools with confidence measurement are in general not fully
functional multimedia systems, incorporating such facilities as sound and animation,
but do rely on a level of interactivity, which associates it with the multimedia
educational platform. A major role of online assessment with confidence measurement
is to promote and support self-directed learning (Keller, 2008). Consideration should
therefore be given here to the role of interactive systems in learning. Multimedia
components are commonly used in education, greatly supported by the ease by which
they are accepted in today’s society. They can play a significant role in the learning
environment, contributing to numerous presentation and support material well suited to
the technological arena. Clark and Feldon (2005) identifies five common principles in
support of interactive instructional material, two of which are of significance to this
research, the first being the ability for multimedia components to accommodate various
learning style preferences and the second is the encouragement of student managed
constructivist and discovery approaches to learning(R. Clark & Feldon, 2005) (R.
Clark & Feldon, 2005). Choi Choi, Lee and Jung (2008) in considering multimedia and
learning styles claim that sensing, sequential, and reflective learners tended to have a
more meaningful learning experience with the multimedia learning component
compared with intuitive, global, and active learners (Choi, Lee, & Jung, 2008). With
this far-reaching impact Hede (2002) asks the question of multimedia’s affect on
learning, which has often been a point of contention. Hede (2002) discusses previous
disputes regarding the claim that multimedia elements in education have a significant
contribution, postulating that although the cost of delivery can be reduced and the speed
and availability increased, inclusion of multimedia in the educational process does not
necessarily improve the experience.
In further discussion addressing the concerns of the effects that components of
multimedia have on learning, Hede (2002) attempts to address these inconsistent
findings, through the formulation of his Integrated Model of Multimedia Effects on
Learning. This model is of particular interest to this research as any proposed
52
educational interactive solution by its nature is dependent on the e-learning paradigm
for delivery.
For the purpose of clarification at this time, Hede’s (2002) model will be introduced,
briefly articulating its relevance to this research. Hede’s (2002) model consists of 12
elements and the relationship that bind them, which together demonstrate multimedia’s
contribution to the learning process. In particular, online assessment with confidence
measurement aligns itself strongly with 3 of these elements; Working Memory,
Reflection and Long Term Memory.
Hede’s (2002) Integrated Model of Multimedia Effects on Learning, shown in Figure 3-
3, demonstrates the 12 elements and the relationship that they have with each other as
the student travels the learning path.
Figure 3-3 Hede’s (2002) Integrated Model of Multimedia Effects on Learning
As demonstrated in Figure 3.3, Hede’s (2002) Integrated Model of Multimedia Effects
on Learning categorises the 12 contributing elements into four distinct categories:
• Input: (three element: visual input, auditory input, learner control)
53
• Cognitive Processing: (two elements: attention, working memory)
• Learner Dynamics: (three elements: motivation, cognitive engagement, learner
style)
• Knowledge and Learning: (four elements: intelligence, reflection, long term
storage, learning)
The arrows in the model indicate either causal or an associative relationship between
the conceptual elements.
The Integrated Model of Multimedia Effects on Learning (Hede, 2002) holds the
Learning element as the only fully dependent variable and the Learning Style as the
only fully independent variable with intelligence also a possible candidate, depending
on the construction. Learner control is considered to be an intervening variable, which
is determined by learning style and by the moderating variable cognitive engagement,
which is itself moderated by motivation that is in turn influenced by learner control.
3.12 Contribution of Assessment with Confidence Measurement to
Hede’s (2002) Model.
Online assessment tools offer interactive components that are in the control of the
learner as part of the learning process. This research considers assessment with
confidence measurement to play a supportive role in the gaining of knowledge. The
following discussion addresses the components where assessment with confidence
would be an active contributor in the learning, as demonstrated within Hede’s (2002)
model.
Hede’s (2002) model identifies visual input as an element of the Input classification,
along with auditory input and learner control. This visual input is a significant
component in the design of an interactive assessment tool, as it both determines the
presentation and interaction of the participant to the system as well as facilitating the
graphical displays. Many of the interactive tools developed for education offer learner
control over the input, permitting the user to navigate through the environment as they
deem necessary to achieve the best possible result. Some feel however that learner
control in multimedia applications is less efficient than program control (McNeil &
54
Nelson, 1990). It is generally accepted that the amount of learner control needs to be
proportionally designed in accordance to the capacities of the learners and the time
constraint of the learning experience (Gerjetsa, Scheiter, Opfermann, Hesseaand, &
Eysinkc, 2009).
An assessment with confidence measurement tool plays an important role in the
cognitive dynamics (attention and working memory) of the learning process, reflected
in Hede’s (2002) model, as it facilitates the method to concentrate the attention of the
learner by focusing on particular input and responses. In addition it supports the
working memory by increasing the retention of information by providing rehearsal
(Hede, 2002) whilst establishing referential connection from visual representation.
The role that assessment with confidence has in the learning dynamics of Hede’s (2002)
model is pertains to motivation, (extrinsic and intrinsic), cognitive engagement and
learner style as a key variable in learning (Taylor‡, Sumner, & Law†, 1997). The
design features of assessment with confidence tools often have intrinsic motivational
factors, such as visually pleasing graphics, responsive sliding bars and a clear results
section that is considered to provide some initial incentive to engage with the system
(Hede, 2002). It is the intrinsic motivational factors from the challenging and
interesting content, such as the ability to show graphics and diagrams that will produce
the sustained effort (Najjar, 1996). This invariably leads to deeper cognitive
engagement (Komarraju, Karau, & Schmeck, 2009), often resulting in the learners
taking full control of their learning. This intrinsic motivation underpins concepts of
game theory (Adams & Rollings, 2007) relevant to this study and will be discussed in
more detail in Chapter 6.
According to Hede’s (2002) model, and generally accepted by educators, successful
learning is dependent on converting Working Memory to Long Term Memory (Seufert,
Schütze, & Brünken, 2009), ultimately progressing towards the final goal of Learning.
Assessment with confidence measurement can play a significant role in this process by
facilitating rehearsal and revision of content, contributing to this essential conversion.
Taylor, Sumner and Law (1997) consider the process of reflection often results in self-
directed learning where the learners think critically about their current knowledge and
their learning strategies. This also addresses Davidoff‘s (1995) concerns of the
55
miscalibration of confidence, which can often occur with the use of traditional MCQ
formats that permit and encourage guessing. He considers miscalibrated confidence in
medical education equally as concerning as lack of knowledge. Further, miscalibration
of confidence can often lead to the transferring of incorrect facts from the participants
working memory to long-term memory. This situation is highly undesirable having
extremely negative affect to the student’s learning, where wrong knowledge is
reinforced as being correct.
The establishment of cognitive linking for further connections to exist in the
procurement of new content built on the existing knowledge is a fundamental
requirement for the advancement of learning. Kalyuga, Chandler and Swellers (1998)
research has demonstrated that the effectiveness of multimedia strategies varies
depending on the learner knowledge and experience (Kalyuga, Chandler, & Sweller’s,
1998).
Figure 3-4: Relation of Assessment with Confidence Measurement to Hede’s
(2002) Multimedia Model.
56
The objective of assessment with confidence is to support the student’s reflection,
assisting in the passing of knowledge from the working memory to the long-term
memory, whilst establishing a strong foundation for additional knowledge to be built
upon, facilitating the cognitive linking process.
The role of the assessment with confidence measurement (ACM) when incorporated in
the learning strategy is demonstrated in Figure 3-4, where the path of the individual’s
activities when using ACM is superimposed onto all of the relevant contributing
elements of Hede’s (2002) Integrated Model of Multimedia Effects on Learning. It is
noticeable that some of the elements from Hede’s (2002) original diagram have been
deliberately excluded to clarify the particular role of ACM. This is justified as some of
the elements, such as Audio input, are not part of the operation of the version of ACM
relevant to this research, while others, Intelligence and Learning Style, are of utmost
importance and can be assumed to be a major contributor as part of the learning process
under consideration.
3.13 Assessment with Confidence Measurement as the Proposed
Solution
The discussion above has highlighted the concern that present traditional MCQ
assessment strategies are requited to meet the criteria of good assessment practices, by
supplying valuable, timely and comprehensible feedback while offering grades with
validity and reliability, contributing to the learning process, building confidence and
encouraging deeper learning. In contrast to this MCQ tests usually permit and
encourage guessing, fail to allow for demonstration of partial knowledge and
discourage the honest declaration of little or no knowledge. The evidence of previous
research has provided a good foundation for the development of assessment strategies
to the benefit of both instructors and students. The research of Pollard (1989), and
further pursued by Hobson and Ghoshal (1996), Bush (2001), Jennings and Bush
(2006) and Frandsen and Schwartzbach (2006), demonstrate the advantages of
developing a scoring system designed to eliminate the gain from guessing and honestly
reflect the student’s understanding of the subject. Paul (1994), Brown and Shufford
57
(1993) and Klinger (1997) all have demonstrated variations of systems designed to
capture a numerical representation of the student’s level of knowledge by use of
confidence. Gardner-Medwin and Gahan’s (2003) and Gardner-Medwin (2006) has in
the past and present implemented his online assessment strategy incorporating
confidence measurement designed to deter guessing by penalising severely any student
who demonstrates high confidence in an incorrect answer. Davies (2005) developed his
assessment strategy based on the assumption that high confidence in a correct answer
deserves greater reward and applies a completely opposite grading system to that of
Gardner-Medwin (2006).
This research, in addressing the earlier mentioned concerns, proposes the use of an
interactive confidence measurement assessment strategy based on the traditional MCQ
presentation format of a stem (question) followed by four optional answers, to be used
for formative and summative purposes. This version of the MCQ permits the instructor
to provide more than one correct answer in the four answers given for consideration by
the student. It further selects for the balanced reward and penalisation scoring system
for implementation, decided upon as a result of extensive consideration to scoring
options to be discussed in detail in Chapter 4.
3.14 Summary
The rapid development of the Internet as a means of educational delivery and
assessment encourages the designing, testing and evaluation of innovative assessment
tools. This chapter first considers the various learning theories and their reliance on
intrinsic and extrinsic motivational factors. It then presents Kolb’s (1984) four-stage
learning style model, demonstrating the importance of assessment strategies that work
through those four phases to encourage stimulation for deeper learning. This chapter
then identifies the importance of feedback in the learning process supporting the
shifting of the responsibility of learning to the student via self-assessment and self-
regulatory learning strategies. The advent of technology has significantly influenced the
way assessment is used, often pushing their application beyond their original intention,
such as the extended deployment of MCQs. This situation has created a new set of
58
problems as the reliance on them to return an indication of a student’s level of
knowledge has increased. The chapter then reflects on the previous work of others who
have devised scoring methods to address the shortcomings of traditional MCQ
assessment strategies in an attempt to improve the calibration of a student’s level of
knowledge and as a mechanism of reflecting their understanding of content as precisely
as possible. In particular the work of those researchers that have used confidence
measurement aimed to reflect the true level of knowledge of the student, keeping them
honest to themselves. It postulates that assessment with confidence measurement has a
significant role to play in contributing to the galvanisation of the comprehension of the
content supporting the acquirement of knowledge. The chapter then closes with the
discussion on Hede’s (2002) model of multimedia effects on learning that gathers
together the contribution of assessment with confidence to the learning experience by
producing intrinsic motivational factors that provide rehearsal and self-reflection,
invariably leading to deeper cognitive engagement and support the passing of
knowledge from the working to the long term memory.
Chapter 4 will discuss the various scoring options used by other researchers and
developers highlighting their strengths and their weaknesses, demonstrating the
mathematics that support their implementation. It also contains a comparative
mathematical analysis in support of the scoring method adopted by this research.
59
CHAPTER 4 SCORING OPTIONS FOR ASSESSMENT WITH CONFIDENCE
Chapter 3 identified the concern that many assessment strategies do not meet the
criteria for good assessment practices. This can be partially attributed to the advent of
new technology pushing traditional testing practices beyond the purpose of their initial
design. This is the case with MCQs that were designed to assess large groups of
students on broad areas of knowledge and are now frequently used to assess deeper
levels of knowledge, therefore requiring greater effort to construct and have increased
complexity, challenging the most experienced MCQ question writers. Additionally, the
conventional MCQ testing methods encourage guessing and fail to reward partial
knowledge, providing a linear solution to a multidimensional problem. Assessment with
confidence measurement is designed to increase the feedback to the participant in an
attempt to reflect as best as possible their knowledge on any tested area.
This chapter’s focus is solely on the selection of a scoring system for implementation
into MCQs that will optimise the student’s and instructor’s feedback, usage and
interest. As part of this discussion we consider in detail the work of others in their
attempts to address the aforementioned problems, discuss at length the positives and
negatives of each study and finally mathematically compare the scoring options
available. The contribution of any adopted scoring mechanism to the effectiveness of
the self-assessment exercise is of the utmost importance as Sim, Read and Holifield,
(2008) have identified the student as having the most to lose. Hence, the responsibility
of using an appropriate method is critical. In particular this chapter revisits and
thoroughly discusses the extensive work of Pollard (1985; 1986; 1993) and Pollard and
Clark (1989), Gardner-Medwin (2006), Gardner-Medwin and Gahan (2003), Paul
60
(1994), Klinger (1997) and Davies (2005), as their contribution to the field has been
both influential and of great value.
61
4.1 Taxonomy of Scoring
This research argues that there are attributes of a good assessment strategy that need to
be met in order to produce valuable feedback, to help the student learn by adjusting the
learning path, and ensure validity and reliability to enable grading of the students fairly
and consistently. Consequently there is a need to prescribe scoring methods that suit the
exercise and can vary depending on the use. The objective of any self-assessment
exercise is to place the student in the most optimal position to evaluate their
performance and consequently modify their direction of study in order to address the
shortcomings of their knowledge of the topic being considered. To improve the
instructor’s evaluation of a student it is necessary to ensure that the scoring method
adopted is appropriate, requiring careful consideration, as an incorrect scoring method
choice could lead to unsatisfactory results.
At present there are quite a few MCQ assessment-scoring techniques used. In order to
set the scene some will be briefly introduced and summarised here.
The Conventional MCQ scoring awards 1 mark for a correct answer with 0 marks for
an incorrect answer. This is the most common scoring technique used. There is a slight
variation of this where a negative mark can be assigned for an incorrect answer to
counteract guessing.
The Liberal scoring system permits the student to identify more than one correct
answer. The scoring method as devised by Hobson and Ghoshal (1996) is applied as
follows;
If one of the answers identified is correct the student receives a score depending on the
number of answers they nominated. As an example if the student choose 2 out of 5
options that includes the correct answer they get a proportion of the full marks they
would get if they had of chosen the single correct answer. Bush (2001) allocated 1 for a
correct answer and incorporated a penalty for choosing an incorrect option of –1/n-1
marks (n is the number of options). Frandsen and Schwartzbarch (2006) used a
logarithmic function based on the number of options and registered guesses to award a
62
positive score for a correct choice and a negative proportion of this for an incorrect
answer.
Elimination testing has the candidate marking as many incorrect answers as possible,
receiving 1 mark for each incorrect option identified and marks deducted for including
the correct answer in the choice of incorrect answers (Bradbard et al., 2004). Pollard
(1986) has a variation on this approach in which he also introduces positive marks for
correctly identifying incorrect answers and negative marks for identifying correct
answer as incorrect.
Finally, Confidence marking is defined to be where the candidate declares their level of
confidence in their answer. The calculation of the mark varies and will be discussed at
length in this chapter, but all variations include the registered level of confidence in the
calculation.
4.2 Previous Scoring Methods to Address the Issue of Guessing
As previously presented the adoption of methods of scoring that uses penalties is not
confined to recent times. There have been many documented examples in the past
where educators have introduced and evaluated various techniques designed to produce
more discerning results in an attempt to address the issue of guessing or hedging when
sitting a test. None more cited than the extensive work of Pollard (1985, 1986, 1993)
and Pollard and Clark (1989) who developed, trialed and implemented a
mathematically sound marking system for traditional multiple-choice question (MCQ)
tests that was specifically designed to minimise the affect on any grade from guessing.
Pollard (1986) argued that guessing was an accepted practice, either in a sensible
manner or totally randomly. He further considered that the evidence obtained by his
previous work (Pollard, 1985) maintains guessing is not a minor component in
examinations, but plays a major role. Others, such as Paul (1994) and Klinger (1997),
developed their systems to address the same issue, as they consider guessing to produce
significant noise when grading students, and to mask the large variety of states of
knowledge. As Pollard (p 50, 1986) states: “As guessing has nothing to do with
knowledge, it makes sense to design a paper that will minimise the effects of guessing.”
63
Gardner-Medwin (2006) also considers that lucky guesses not to be the same as
knowledge, and that confident, wrong answers require special attention, claiming
guessing can have extreme detrimental effects on the student’s learning.
The belief underpinning Gardner-Medwin and Gahan’s (2003) work is that to measure
knowledge one must measure a person’s degree of belief, simply demonstrated when he
considers the words often used to represent different states. He maintains that educators
describe the degrees of belief that a student has about a true statement as having one of
the following: Knowledge, Uncertainty, Ignorance, Misconception or Delusion.
Gardner-Medwin and Gahan (2003) assigns probabilities for the truth to the above
student states, saying that they range from 1 to 0, where p=1 is knowledge, p=.5 is
acknowledged ignorance and p=0 is delusion, with uncertainty placed between .5 and 1
and Misconception between 0 and .5. Delusion is of extreme concern as it is a total
belief in something that is false. In light of this, ignorance is not the worst state to be in.
Misconceptions (p=.33), having a level of confidence in an incorrect Answer can be an
obstacle in learning, especially when attempting to build high levels of learning.
Paul’s (1994) extensive work supports the notion that we need to implement grading
systems that proportionally reward the participant, especially when it comes to
acknowledging levels of belief in their answers. Paul’s (1994) developed Computer
Based Alternative Assessment (CBAA) tool was specifically designed to address the
concern of a student’s tendency to guess, while realising greater benefit from the use of
innovative assessment alternatives in the resource limited settings of typical educational
environments. Educational institutions’ enthusiastic adoption of the hyper-media
environment encourages the engagement of the students at visual, aural and kinesthetic
dimensions, promoting sensory activities employing both the left-brain and right-brain
faculties. Like others, Paul (1994) voices concern with the use of the traditional MCQ
scoring technique, questioning its appropriateness, questioning if a student’s knowledge
black or white and how best can a student express belief in the likelihood of an
alternative correct answer. He formally classifies the contributions of self-assessment to
the student as establishing and revealing status (knowing what you know), diagnosis of
weakness (knowing what you do not know), comparative analysis to the larger
population (Where do I stand?), assimilation into internal cognitive framework (pulling
64
it together) and finally higher order cognitive abilities (synthesis). In addition, there is a
need to support meta-cognition, to empower the student in managing their learning and
demonstrate positive correlation between study and achievement. Paul (1994) continues
by identifying the desire to produce innovative assessment that drives the student, as
often the assessment tasks determine the instruction and depth of learning. If the blend
of assessment does not involve the higher order cognitive abilities, such as problem
solving, then the instructors too often do not address those areas in their instructional
material.
4.3 Scoring Using Penalties for Incorrect Answers to Reduce the
Impact of Guessing
In the past, educators have introduced various scoring techniques to penalise students
for incorrect answers. In this section some of the scoring methods that have influenced
this research will be presented and discussed, demonstrating the mathematical argument
in their support. Each of the presented scoring methods has varied degrees of penalty
applied for incorrect answers and in most cases substantial mathematical justification
supporting implementation. It should also be noted that the structures of the various
assessment strategies referred to here vary: some are based on single answer style
questions, and four and three option MCQ formats with one correct answer. The
assessment strategy proposed by this research permits the instructor to have one or
more correct answers to an individual question, which needs to be considered when
comparing the approaches to scoring. This is reflected in section 4.4 where the scoring
used for this research is discussed at length.
It is appropriate that the work of Pollard (1985,1986,1993) be considered first as it
offers a foundation on which others have been built. Pollard’s MCQ scoring technique
requires the participant to correctly identify the one correct answer by a tick. If this is
not initially apparent, that is, the correct answer not known on the first inspection, the
participant can place a cross for options that they consider to be wrong. The utilisation
of Elimination Scoring (Ng & Chan., 2009), as Pollard (1986) promotes, gives the
student the opportunity to score by eliminating the incorrect answers, which although
65
quite effective in encouraging the students to reveal their true state of knowledge, can
confuse them in the process. Consequently, a completed test would consist of a series
of questions with a tick, crosses or both next to the options.
To explain Pollard’s scoring approach we first must consider all the possible responses
to a question with 4 options. Table 4-1 demonstrates the possible ordered responses
(Resp 1 to 10) for a question from the candidate, albeit a simple example where the last
option D (in green) is correct and the other options A, B , C (in red) are incorrect.
Resp
Answer Resp 1 Resp 2 Resp 3 Resp 4 Resp 5 Resp 6 Resp 7 Resp 8 Resp 9 Resp 10
A X X X X X X X
B X X X X
C X
D X X X
Table 4-1: Possible Responses with Four Options, Pollard (1985).
KEY Student Selected As Correct Answer is Correct X Student Selected As Incorrect Answer is Incorrect
The green crosses and ticks designate that the student has correctly identified an option
to be incorrect or correct respectively. In contrast the red crosses and ticks designate
where the student has incorrectly identified an option to be incorrect or correct
respectively.
In this case the student will place a tick for the option that they consider to be the
correct answer if they know, as demonstrated in Response 4 of the Table 4-1, receiving
maximum marks. If they are not too sure of the correct answer they can place a cross in
the options that they know are not correct, receiving some marks for identifying
incorrect options but not full marks as they have not yet identified the correct one, as
demonstrated in Responses 1 and 2 of Table 4-1. Pollard makes the assumption that by
66
identifying three answers as being incorrect then the remaining fourth answer must be
correct. Therefore, if a student places 3 crosses correctly identifying the incorrect
options it is assumed that they have correctly identified the correct option and get the
full marks as if they have placed a tick next to it, as can be seen by Response 3.
Educators often debate this point with some considering this to be an improper
deduction, as a student might not necessarily be sure that the fourth option is correct.
The remaining Responses 5-10 contain situations where the student has either
incorrectly identified an answer as being correct by placing a tick next to an incorrect
answer (Responses 8,9,10), or incorrectly identified an option as being incorrect by
placing a cross next to the correct answer D (Responses 5,6,7). With these possible
responses now identified the scoring can be considered.
Pollard relies on a complicated process of allocation of partial scores for a student’s
correct identification of both correct and incorrect answers, facilitated by assigning ki
values to the various responses, culminating in 9 k values, where k1 to k3 contribute to
the positive marks given and k4 to k9 contribute a negative effect to the score. The k
values combine together to give a score for the question depending on the combination
of correct and incorrect crosses and ticks. This approach produces a complex array of
scoring formulas. The calculated score based on the combination of tick and/or crosses
given for any question is displayed in Table 4-2. Resp
Answer
Re
sp 1
Resp
2
Resp
3
Resp
4
Resp
5
Resp
6
Resp
7
Resp
8
Resp
9
Resp
10
A X X X X X X X
B X X X X
C X
D X X X
Score k1 k1+k2 k1+k2
+k3
k1+k2
+k3
-k4 k1-k5 k1+k2
-k6
-k7 k1-k8 k1+k2
-k9
Grade 1/6 1/2 1 1 -1/2 -1/2 0 -1/3 -1/4 0
Table 4-2: Scoring Formulas for Responses Pollard (1985).
67
A value is assigned to a constant k, which is used as the basis of the formula to
calculate the final grade. The k’s (all positive) displayed in the last row in Table 4-2
indicate the formula for the score given for the response directly above.
You will notice that in the first response, Resp 1, the student does not know the correct
option but appears to know that the first option given is incorrect and confidently
identifies it as such with a cross (X1). In this case the score, k1, must reflect this
confident choice by rewarding them with a positive partial score. This is the same for
the responses for Resp 2 and Resp 3 where the candidate correctly identifies a further 1
or 2 more incorrect answers, being graded k1+k2 and k1+k2+k3 respectively. Placing a
tick in the correct option (Resp 4) is the same as identifying all of the incorrect options
and receives the same grade as Resp 3, k1+k2+k3. However, identifying the correct
option as being incorrect Resp 5, invokes a negative score of –k4, and the combinations
of identifying some of the incorrect options correctly and the correct one incorrectly
(Resp 6, Resp 7) have scores incorporating penalties, k1-k5 and k1+k2-k6 respectively,
where k5 and k6 have been introduced as a means of subtracting an amount from the
score for falsely identifying the correct answer as being incorrect while correctly
identifying as incorrect one or two options respectively. Similarly Resp 8, Resp 9 and
Resp 10, where a candidate correctly places 0,1,or 2 crosses but incorrectly places a
tick, receives the marks of -k7, k1-k8 and k1+k2-k9 respectively. Pollard’s scoring criteria
is based on the expected outcome (or Gain) of such an activity, and combinations of
possible further results too numerous to demonstrate here. He postulates that any value
for the k’s must be calculated such that any score must not be increased by guessing. In
doing so he places restrictions on the k values, which are placed as a value in expected
value equations for optimising. Some of these equations will be given below for
discussion as samples with their restrictions but again there are too many to be fully
displayed here.
The following six equations give the expected score, E(S), for an individual who has no
knowledge and is guessing:
68
Randomly guessing one cross: E(S) = 3/4* k1+1/4*(- k4)
Randomly guessing two ordered crosses:
E(S) = 1/2*(k1+k2) + 1/4*(- k4) + 1/4*(k1-k5)
Randomly guessing three ordered crosses:
E(S) = 1/4*(k1+k2+k3) + 1/4*(- k4) + 1/4*(k1-k5) +1/4*(k1+k2-k6)
Randomly guessing a tick:
E(S) = 1/4*(k1+k2+k3) +3/4*(-k7)
Randomly guessing a cross and a tick:
E(S) = 1/4*(k1+k2+k3) + 1/4*(- k4) + 1/2*(k1-k8)
Randomly guessing two ordered crosses and a tick:
E(S) = 1/4*(k1+k2+k3) + 1/4*(- k4) + 1/4*(k1-k5) +1/4*(k1+k2-k9)
In order that any candidate has no expected gain from randomly guessing, these
equations must all be less than or equal to 0, which requires the assigned values of k4 to
k9 (those that contribute a negative affect to the score) be selected to ensure this to
occur. In addition, the equations that represent a candidate who correctly assigns one
cross and guesses others, as well as the candidate who correctly assigns two crosses and
guesses others (equations not included here) have to be considered to eliminate the gain
from guessing.
The resulting constraints apply:
3k1-k4 < 0; 2k2-k5 < 0;
k3-k6 < 0; k1+k2+k3-3k7 < 0;
k2+k3-2k8 < 0; k3-k9 < 0;
Pollard (1986) produces a number of solution sets satisfying these constraints,
identifying two preferred sets that provide the most effective results (see Table 4-2).
His final decision was based on the need to recognise partial knowledge and
maximization of the minimal score for incorrect responses.
Consequently he recommends choosing either of the following scoring method as
outlined in Table 4-3.
69
k’s First set of possible k
values satisfying
equations
Second set of possible k
values satisfying
equations
k1 1/6 1/5
k2 1/3 3/10
k3 1/2 1/2
k4 1/2 3/5
k5 2/3 3/5
k6 1/2 1/2
k7 1/3 1/3
k8 1/12 2/5
k9 1/2 1/2
Table 4-3: Pollard’s Two Solutions for k Values.
Applying these values to the scoring formulas given in Table 4-2 the corresponding
scores are shown in Table 4-4 below.
Table 4-4: Example of Pollard’s Scores for Both Sets of Values of k.
Although Pollard does not identify a preferred scoring mechanism from the two above,
in his later work, when he considers the more complex non ordered simulation, the first
Response Scores Scores using first set
of possible k values
Scores using second set
of possible k values
1 k1 1/6 1/5
2 k1 + k2 1/2 1/2
3 k1 + k2 + k3 1 1
4 k1 + k2 + k3 1 1
5 –k4 –1/2 –3/5
6 k1 – k5 –1/2 –2/5
7 k1 + k2 – k6 0 0
8 –k7 –1/3 –1/3
9 k1 – k8 –1/4 –1/5
10 k1 + k2 – k9 0 0
70
column values (k1=1/6; k2=1/3; k3=1/2 etc) are again contained in the final table of
preferred options, strengthening the argument for it as the nominated final choice.
Applying these values of k to the Expected Score (E(S)) equations that are designed to
emulate the expected score when a candidate guesses without the knowledge above we
obtain the results in Table 4-5.
Expected Score E(S) for a student with
no knowledge randomly guessing….
Calculated E(S)
one cross E(S) = 3/4*(1/6)+1/4*(- 1/2)=0
two ordered crosses E(S) = 1/2*(1/6+1/3) + 1/4*(- 1/2) + 1/4*(-
1/2)=0
three ordered crosses E(S)=1/4*(1)+1/4*(-1/2)+1/4*(-1/2)
+1/4*(0)=0
one tick E(S) = 1/4*(1) +3/4*(-1/3) =0
one cross and one tick E(S) = 1/4*(1) + 1/4*(- 1/2) + 1/2*(-1/4)
=0
two ordered crosses and one tick E(S) =1/4*(1) + 1/4*(- 1/2) + 1/4*(-1/2)
+1/4*(0)=0
Table 4-5: Expected Scores for Random Guessing, Pollard (1985).
These results confirm the scoring regime that eliminates any positive gains for
guessing. Pollard’s work is of the utmost interest to this research as it has paved the
way for other scoring systems to be developed. In particular, it established a
mathematically valid technique of using penalties in an attempt to deter students from
guessing, as they know that the consequences of their actions will have a negative
effect on their grade. At the same time it also recognises partial knowledge, as the
positive recognition of an incorrect answer gives partial marks. While it offered an
effective alternative to scoring it is very difficult to apply, confusing for the instructor,
and most importantly, more so for the student, possibly distracting from their attention
to the question at hand. Pollard’s scoring is used extensively for the Australian
Mathematics Competition where it has served the purpose well, contributing greatly to
that institution. However, extended application to the smaller, customized educational
71
bodies without the extensive infrastructure required to process them is difficult to
implement and support.
The work of Pollard (1985,1986,1993) and Pollard and Clark (1989), though
mathematically valid, relies on complex calculations to achieve the final result
effectively removing control from the hands of the users. The primary objective of this
research is to provide a self-assessment tool to be used by the student, offering them
direct control over the consequences of their actions. The complexity of Pollard’s
(1985,1986,1993) marking system and its required understanding could disadvantage
many of the participants. The utilisation of “Elimination Scoring”, as Pollard
(1985,1986,1993) promotes, gives the student the opportunity to score by eliminating
the incorrect answers, which although quite effective in encouraging the students to
reveal their true state of knowledge, can confuse them in the process.
The design of the scoring system for this research was influenced by Pollard
(1985,1986,1993) and Pollard and Clark (1989) underpins arguments for deterring
guessing, incorporates probability in the decision-making and the need for the
recognition of partial knowledge.
Paul’s (1994) Computer Based Alternative Assessment (CBAA) is designed to address
the issues previously discussed and his arguments and supportive reasoning behind the
instigation of the CBAA innovative approach to assessment significantly underpins this
research.
The major goals of the CBAA is to improve the value of assessment by providing more
useful experiences, achieve more valid indication of the student’s knowledge, and
produce comparable measures in assessing students’ ability to apply their knowledge to
solving problems. Paul (1994) suggests that the CBAA package offers discrimination
between finer grained states of knowledge, greater disclosure of student’s ability to
apply their knowledge and increased awareness of the students own knowledge state.
Importantly, Paul (1994) believes that using the traditional 0-1 scoring system loses the
ability to discriminate between states of knowledge. Like Gardner-Medwin and Gahan
(2003), Paul (1994) uses common expressions to support the student through the
experience, such as, “I strongly believe B to be correct”, “I believe C to be correct but I
can’t distinguish between A and B” or “From what I know each alternative seems
72
equally likely to be correct.” These expressions are equated to equivalent probabilities.
In his diagnosis of previously adopted scoring systems Paul (1994) justifiably identifies
concerns of the traditional “Number Right” grading system, where the same mark is
assigned to those who have knowledge and those that have guessed, or alternatively
grouping together those with complete misinformation, those with some
misinformation and those who guess incorrectly. The “Correction for Guessing”
formula adopted by some educators is an alternative, which can be applied at the final
stages of the calculation of the score. However, correction for guessing does not take
into consideration the effects of partial knowledge and does very little to encourage
students to report their true levels of knowledge.
The value of assessment strategies that offer wider ranges of responses is that they
permit students to demonstrate their true level of knowledge. To be effective they
should also encourage students to participate in the activity by rewarding them
appropriately depending on the perception of the probability distribution. Brown and
Shufford (1973) demonstrate that people who are aware that they will be rewarded
according to admissible schemes will divulge probabilities they believe in and not
attempt to shade them one way or the other to exploit the scoring system, as to do so
would require the student to place bets that are considered unrewarding. It is the
admissible or proper scoring system as outlined above that truly encourages honesty.
After much consideration and deliberation Paul chose to develop the CBAA around a
“Confidence Reporting” framework, hence the relevance to this research.
Paul’s (1994) CBAA, as briefly introduced in Chapter 3, is an interactive computer
based system, which presents the candidate with a triangle offering three alternative
answers positioned at the apex’s, similar to the later developed Klinger’s (1997)
triangular interactive system mentioned in Chapter 3. This is demonstrated in the first
Diagram A in Figure 3-2. The triangle has 16 zones in which the student can place the
cursor over to indicate their confidence towards any particular preference, be it A, B, C
or any combination.
Field studies combined with the work of Brown and Shufford (1973) identified 16
zones as optimal when considering infinite precision probability space, providing
sufficient discrimination among the student’s knowledge states, avoids minutia
73
obsession behaviour, minimises reliance on excessive motor skill manipulation and
most importantly, “exhibits intuitive correspondence between the visual regions and
their interpretations” (Paul, p 18, 1995). When moving the cursor, the proximity to the
apex represents the belief in the answer indicated to that vertex. The corresponding
probabilities for each zone when the correct answer is A are demonstrated in Figure 3.2
Diagram B. The top apex where the correct option is placed has the probability of 1
while the other apexes are assigned zero. There are various other probabilities assigned
depending on the proximity to the correct apex, notably placing the cursor in the
middle, declaring that the student is not sure of the correct option, assigned .33, as
expected. A student placing the cursor in any of the 16 zones creates the three-element
vector < PA, PB, PC >, where PA is the probability registered relative to A, PB is the
probability relative to B and PC is the probability relative to C. Consequently,
positioning the cursor in the middle of the triangle gives the vector <.33, .33, .33>. The
scores generated are based on a logarithmic function that calculates a set of non-linear
results. As an example a student that positions the cursor close to the correct answer A
at the top vertex, say .8 in Figure 4.1, receives the score of 92/100, which reflects their
strong commitment to a correct answer. The same positioning towards an incorrect
answer, .2 or 0 gives a score of 46/100 and 0/100 respectively, depending on whether
the cursor is positioned favouring the correct answer or the other incorrect answer. The
student is required to complete the process for each question in the test generating
scores as they go. The final grade is presented to them at the end of the test with
feedback of which questions they got wrong and the correct answers for those
questions.
Paul (1994) discusses and justifies his choice of scoring at length. He relies heavily on
the notion that a scoring system should be “admissible” or “proper” as defined by
probability theory, stating that students who exhibit high belief in the likelihood of a
correct answer should be rewarded higher than those who “shade” their reporting with
lower levels of confidence. Paul’s (1994) adopted scoring system is based on the work
of Brown and Shufford (1973) in which they developed a scoring system to quantify
uncertainty into numerical probabilities for representation of intelligence.
74
As an example, a student (1), who indicates 40 per cent for an incorrect answer, is not
completely wrong but neither are they right. They are better than another student (2)
who indicated 80 per cent for the same incorrect answer, and should accordingly be
graded higher; however, they are not as good as the third student (3) that registered 10
per cent for the same incorrect answer, who should receive a higher grade than student
(1).
Paul’s (1994) scoring is based on assigning credit according to a scheme similar to
wagering, where there are various wagers with various odds. It assumes that the more
knowledgeable students will gain greater credit over the extended period of time than
the less knowledgeable. If there are f(u)du wagers available at the correct odds of
(1-u)/u a student who believed the likelihood of an item being correct to be p would
accept all wagers at odds better than (1-p)/p that the item is correct and accept all
wagers on the item not being correct at odds better than those appropriate for
probability 1-p. This gives rise to the following equation for payoff for a student
choosing among n alternatives where pi is the probability of the ith alternative.
pi n pj
Payoff if ith event occurs = ∫ f(u) ((1-u)/u ) du - ∑ ∫ f(u) du
0 j≠i 0 pi n pj
= ∫ (f(u) du) / u - ∑ ∫ f(u) du
0 j=1 0 This formula needs to be adjusted to eliminate the possibility to “game” by assuming
equal probability for all by requiring the student to take odds on wagers placed at
probabilities greater than 1/n and to offer the odds on wagers placed at probabilities less
than 1/n yielding:
pi n pj
Payoff if ith event occurs = ∫ (f(u) du) / u - ∑ ∫ f(u) du
1/n j=1 1/n
Paul (1994), like Gardner-Medwin and Gahan (2003), Gardner-Medwin (2006) and
Brown and Shufford (1973) leverage greatly off Shannon’s (1948) Information Theory,
75
preferring to assign the function f(u) = 1 based on logarithmic scoring, yielding
through integration the logarithmic scoring system:
n Expected Profit = logn - (- ∑ pi log pi ) i=1
(Payoff on the ith event, with n alternatives and pi is the probability of the ith
alternative)
This equation corresponds to the maximum likelihood method of statistical estimation.
The logarithmic scoring system has the student’s reward, on average, equaling the
amount of knowledge that they possess of the material in the question, as it depends
solely on the probability assigned to the alternative that is actually correct (Paul 1994).
This final interpretation of the formula for scoring is adapted by Paul (1994) to produce
the following three equations for scoring the CBAA, where px is the probability
ascribed to for alternative x, n is a normalisation constant and k is a range constant.
Score if A is correct = n + k log2 (3 pA )
Score if B is correct = n + k log2 (3 pB )
Score if C is correct = n + k log2 (3 pC )
Diagram A: CBAA Triangle with the
scoring for each select area where A is
correct, n=62 and k=23.7
Diagram B: CBAA Triangle with the
scoring for each select area where A is
correct, n=0 and k=63
Figure 4-1: Paul’s (1994) CBAA Triangle with the Corresponding Score for Each
Region
76
Diagram A in Figure 4.1 is Paul’s (1994) preferred scoring system where n=62 and
k=23.7 resulting in a student being fully confident in the correct answer receives 100
while demonstrating full confidence in an incorrect choice is 0. There are varying
scores used for registration of confidence in between absolute certainty that have
assigned rewards accordingly.
The second scoring option demonstrated in Figure 4-1 Diagram B, where n=0 and
k=63, is an alternative version based on the same formulas that produces a scoring
regime that imposes negative scores for registration of high confidence in an incorrect
answer, however this is not used for fear by the instructors of the repercussions that
could occur by disgruntled students.
There are a number of concerns that this research has with Paul’s (1994) method of
scoring. The first is that a student who admits that they do not know the difference
between the given options, choosing to sit in the middle, is rewarded with a healthy 63
as demonstrated in Diagram A in Figure 4-1, meaning that a candidate can pass an
exam with no knowledge. The alternative and unused scoring suggested by Paul is
demonstrated in Diagram B in Figure 4-1, where n=0 and k=63. This scoring method
does address this concern to a certain degree as the scoring scale is shifted, where the
reward for being fully confident in a correct choice is 100 and being fully confident in
an incorrect choice is -150. This alternative scoring rewards a 0 for the student who
declares that they cannot choose between the answers, deemed by this research to be
fairer than awarding the 63 as described above. It also disproportionally penalises the
candidate for demonstrating a high confidence on the incorrect answer, a score of -150
for 100 per cent confidence for an incorrect answer, addressing the issue of
miscalibration of knowledge. Paul (1994) supports the use of the non-penalising option;
(Diagram A in Figure 4-1) over the penalising option (Diagram B in Figure 4-1), by
arguing that the CBAA was designed to acknowledge partial knowledge as its primary
objective. This research prefers a scoring method that incorporates the negative
penalisation to combat the registration of high confidence in the wrong answer, as
Gardner-Medwin (2006) refers to as “delusionary”, while still rewarding for partial
knowledge.
77
The author of this research considers the regions of Paul’s (1994) CBAA to appear be
cluttered at the vertices. The design of the user interface could be difficult to navigate,
requiring a minimal operational level of dexterity possibly increasing the cognitive load
on the participant.
Another concern for the CBAA scoring mechanism is the level of complexity, as the
method of score calculation is beyond the comprehension of many of the students who
use it. Any scoring system used should be in direct control of the user. As a student
navigates the interactive area in an attempt to register their confidence they are using
their cognitive mapping skills, moving in a linear path. This proportional linear
mapping should assume a proportional score depending on their physical distance from
the various options. This is not the case in Paul’s (1994) logarithmic scoring. A student
cannot be assured that they will receive a score that is directly linearly proportional to
their positioning on the triangle.
Paul’s (1994) and later Klinger’s (1997) interactive triangular response spaces offer
testing environments that are reliant on moderately high levels of dexterity and a good
grasp of physical spatial interpretation. The cognitive process of registering a level of
confidence is difficult to emulate through a mapping exercise where the student is
required to move a cursor on a blank field to register their confidence. This operational
method assumes that all students would proportionally position the cursor in the same
place for the same confidence. As discussed above this research questions the reliability
of this process, as the operational exercise of students with regards to psychological
mapping varies greatly from individual to individual. What one student considers a
cursor position to represent low confidence might be a registration of medium
confidence to another. This is an unreliable method to precisely register confidence and
could be misleading. This is not such a great concern for formative assessment as the
feedback is generally interpreted by the individual who completes the task and can be
adjusted according to their propensity towards registering confidence, however for
summative assessment where the exercise is more critical to the student’s profile it is
unacceptable. In Paul’s (1994) defense he does apply a correction to the student’s score
by generating a realism function based on the relative frequencies of confidence
registered by the individual over a period of time. This function is then applied to the
78
score adjusting it depending on the individual’s propensity to showing high or low
confidence.
Paul (1994) also acknowledges the concern of students inability to consistently register
their confidence and counter acts the negative perception that it may have by supplying
meaningful video demonstrations that link the registered probabilities and
consequential scores to common phrases of belief, such as “Probably A, Possibly C,
definitely not B”.
The work of Gardner-Medwin and Gahan (2003) and Gardner-Medwin (2006) is a
more recent contributor to the area of confidence assessment and has developed his
Confidence Based Assessment (CBA) strategy. His discipline of application is in the
medical educational field in which the value of confidence recognition and
acknowledgement of confidence is critical in the daily practice in the medical arena. At
present his system is designed for a True/False response to a single statement. This is
different from the MCQ format considered for this research, as demonstrated by Pollard
(1985,1986,1993), Klinger (1997) and Paul (1994), with the single stem question
followed by four or three answer options. However, the MCQ format proposed by the
research does permit the use of multiple correct answers, which makes it a cluster of
True/False questions under the single stem, with each of the options requiring a
statement of confidence, as included in Gardner-Medwin’s (2006) system. It is
therefore applicable at the single answer level and shall be discussed with this in mind.
Gardner-Medwin’s (2006) CBA was introduced in Chapter 3, where it was
demonstrated that he uses a negative scoring system designed to eliminate guessing,
recognise partial knowledge and in particular, severely penalise student responses that
show high confidence in incorrect answers. Gardner-Medwin (2006) classifies student’s
state of knowledge on any given area as either very confident [C=3 (80-100%)], fairly
sure [C=2 (67-79%)] or not sure at all [C=1 (0-66%)], using a scoring system that is
“proper”, rewarding a student accordingly for demonstrating their true beliefs and
being truly honest. He argues that a scoring system should use incentives to encourage
the participants to expose their real state of knowledge. It is for this reason that he
introduces a safe zone for students who are not confident of their knowledge on a
particular area by creating a non-penalising area for low confidence in an incorrect
79
answer. In contrast there is a double negative penalty score for high confidence in an
incorrect answer. This is demonstrated in Table 4-6 with all of the other score options
used.
UCL Confidence-based scoring scheme
Confidence Level 1 2 3
Score if Correct 1 2 3
Score if incorrect 0 -2 -6
Probability correct < 67% >67% >80%
Table 4-6: CBA Scoring System for Correct and Incorrect
Answers. (Gardner-Medwin and Gahan, 2003)
Unlike Pollard (1985,1986,1993) and Paul (1994), Gardner-Medwin and Gahan (2003)
supports his argument for choosing his scoring method with a series of graphs,
demonstrating the optimal path for a student to maximise their score with the
knowledge that they have. As the graph in Figure 4-2 demonstrates, for each possible
confidence level the expected average score depends on the probability of getting it
right. The CBA Scoring system is shown with each of the 3 levels of confidence C1, C2
& C3.
Figure 4-2: CBAA Scores for C1, C2 &C3. (Gardner-Medwin, 2003)
80
As can be seen the optimal path encourages the student to register C=3 for anything
with a confidence greater than 80 per cent, C=2 for a middle level of confidence (from
67%-79%) and once there is any doubt a register of C=1, as it carries no penalty. This
is in line with the upper bounds of the graph in Figure 4-2.
Gardner-Medwin (2006) argues that a crucial feature of confidence-based marking
systems is for them to have a motivating nature. He expressed concerned that many of
the marking systems that concentrate on rewarding highly those with high confidence,
such as Davies (2005), only rewards those students that are bold or perceptive enough
to see that it is never advantageous to register low confidence. One of the main
challenges for students when using the CBA for the first time is the realization that they
can be rewarded for low confidence in a correct answer, re- enforcing that honest
expression of confidence is a highly valued communication attribute in all areas. It is
for this reason that he promotes the use of the negative marking scheme but emphasises
the need for it to be motivating, as if it is not motivating it would be irrational for a
student to behave in a truly honest manner. He supports his argument by referring to the
work of others as presented in Figure 4-3.
Figure 4-3: Other Scoring Options used Including Scheme A from Hassmen &
Hunt (1994) and Schemes B-D from Davies (2005). (Gardner-Medwin, 2006)
81
It can be seen from these graphs that the optimal path varies substantially, depending on
the scoring method adopted. In the first option with no negative marking, it is never
rational to omit an answer and instructors often inform their students to leave no
question unanswered. The second option shows the equally balanced scoring of both
positive and negative which is encouraging for MCQ type questions but not for
True/False. In the case of True/False it promotes the omission of an answer when
confidence is less than 50 per cent, which can be detrimental to the student who has
some partial knowledge and is not prepared to register anything. However, Gardner-
Medwin (2006) acknowledges that the use of the negative marking is better than none.
The third option, Scheme A, attributed to Hassmen and Hunt (1994), has five levels of
confidence with negative marking used. Its greatest penalty for high confidence in a
wrong answer (-120) is greater than its equivalent for high confidence in a correct
answer (100) and also incorporates a safe zone for low confidence in an incorrect
answer (Hassman & Hunt, 1994). Gardner-Medwin (2006) classifies this as properly
motivating but has concerns in relation to the additional lower levels of confidence not
be rationally assigned for True/False at P< .35. Option four from Davies (2005) has 3
set levels of confidence that are equally rewarded for the negative and positive scoring
system, which is similar to the balanced scoring system preferred by this research.
Gardner-Medwin (2006) has voiced concerns with this approach, as he considers it
beneficial for the student to choose “no reply” for any confidence less than 50 per cent.
The final 2 options are again attributed to Davies (2005). Davies argues that students
who demonstrate high confidence in a correct answer should be greatly rewarded (4-5),
disproportional to registering high confidence in an incorrect answer (-2). His grading
system is reverse to that of Gardner-Medwin (2006) who penalises heavily for high
confidence in incorrect answers (-6) compared to the score for high confidence in a
correct answer (3).
Gardner-Medwin (2006) also calls on the work of Shannon’s (1948) theory of
information where he investigates the relationship between the scores and the
appropriate information-theoretic measure of lack of knowledge for a True/False
question, “proportional to the log of the subjective probability assigned to the correct
truth value for a proposition”.
82
This research acknowledges the contribution of Gardner-Medwin’s (2006) work but
has reservations in adopting a scoring system that uses such severe penalties for high
confidence in an incorrect answer, double negative the value of high confidence in a
correct answer. In rejecting Gardner-Medwin and Gahan (2003) and Gardner-Medwin’s
(2006) scoring technique this research does not intend on down grading his perceived
concern of students demonstrating high confidence in an incorrect answer. It is felt that
the students could consider this severe penalty scoring as being too unfair, even though
he softens the effect by offering the safe zone for low confidence in incorrect answers
(score of 0). Gardner-Medwin and Gahan (2003) and Gardner-Medwin (2006) argue
that his students quickly ascertain a technique for using the system that optimises their
score, which has been criticised as another way of learning ‘how to do the test” rather
than learning the content in the test. However, his basis for scoring does address this
issue as the scoring being “proper” and depends directly on the level of knowledge, so
even though the candidate can use an optimal method they must still base their
decisions on their knowledge.
The work of Gardner-Medwin and Gahan (2003) and Gardner-Medwin (2006)
is geared towards medical students, where he imposes a severe penalty for high
confidence in incorrect answers. Even though the above discussion acknowledges his
justification, both educationally and mathematically, it was felt that his approach was
too extreme and would not be well received by the students, being considered to be too
threatening to a population of students with average intelligence. In contrast, Davies
(2005) promoted a similar scoring system based on the reverse of Gardner-Medwin and
Gahan (2003) and Gardner-Medwin’s (2006), disproportionally over-rewarding for
correct responses with high confidence, which this research considers to be too lenient.
4.4 Comparison of an Incremental Balanced Scoring Method to
Previous Work
This section describes and justifies the scoring adopted for this research, building on
the previous work as discussed. While the functionality and application is often
83
different to those outlined above the underlying assumptions and arguments are
fundamentally the same.
This research considers application of assessment with confidence measurement that
incorporated confidence measurement as the determining factor in grading. As Pollard
(1985,1986,1993), Pollard and Clark (1989), Klinger (1997), Paul (1994), Gardner-
Medwin and Gahan (2003) and Gardner-Medwin (2006) have asserted, there is a need
for innovative assessment strategies that tackle the issues of guessing, reward partial
knowledge and encourage honesty regarding the state of a student’s knowledge. In
addition, a key requirement of this research was to design a scoring system that was
easy for the student to comprehend, placing the responsibility of learning into the hands
of the learner. It was felt that many of the previous systems, such as those proposed by
Pollard (1985,1986,1993), Pollard and Clark (1989), Paul (1994) and Klinger (1997)
required complex calculations by the student to ascertain the consequences of their
actions during the assessment. Another influential factor in the identification of an
appropriate scoring regime was to present the student with a relatively non-threatening
environment, in which the student could engage with the assessment exercise without
an over-bearing fear of the consequences of their actions, encouraging exploratory
behaviour with the system as part of the reinforcement of their learning.
Initial examination encouraged this research to investigate a simple linear approach to
the scoring, where the scoring system was analogous to a “betting” game. A student
was encouraged to place a wager (bet) on their answer depending on how confident
they were. The interface evolved to support this notion, presenting itself as a “game”
where the players (students) were presented with a question with four possible
solutions, in some cases with multiple possible correct answers.
The scoring technique preferred by this research is summarised in Table 4-7. For
convenience of discussing the scoring only increments of 10 are considered, although
the system permits the student to register any confidence measurement in an increment
of 1. The design of this application of assessment with confidence was to include a
granular registration of confidence, not as a method to increase the perceived
discrimination between the student’s state of knowledge but to increase the casual use
of the system as part of the appeal to the student, very much in accordance to the role of
84
intrinsic motivation in Hede’s (2002) Integrated Model of Multimedia Effects on
Learning as discussed in Chapter 3.
It can be seen from Table 4-7 that a student with high confidence in an answer would
gain the most by registering (betting) at a high level. They also quickly determine that
they would be equally penalised if the answer were incorrect. Similarly, they have the
opportunity to collect smaller scores for answers for which they have partial knowledge
and minimise their losses if incorrect. This adopted balanced scoring technique
discourages guessing, a primary objective of this research, while offering some rewards
for demonstrating partial knowledge.
Registered Confidence for
each option (Increments of
10) [pi]
Score if Correct
[si]
Score if Incorrect
[-si]
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
-1.0
-0.9
-0.8
-0.7
-0.6
-0.5
-0.4
-0.3
-0.2
-0.1
0.0
Table 4-7: Balanced Scoring Registered Confidence for Correct and Incorrect
Answers.
Figure 4-4 presents two graphs representing the balanced scoring method for some of
the situations outlined in Table 4-7, furthermore indicating the optimal path as used by
Gardner-Medwin (2006) to support his argument for the CBA scoring.
While Gardner considers this balanced scoring system to be “proper”, he considers it
not to be “motivating”, as the optimal path contains the “no reply” option if unsure of
85
the answer, encouraging a student with any doubt of the answer (less than the 50/50
chance) not to respond for fear of penalty. This is a concern discussed in Section 4.5 as
part of the evaluation of the suitability of the scoring technique to the field of
application.
Diagram 1: Balanced Scoring Method
Option
Diagram 2: Overlay of Optional Path
Figure 4-4: MCQCM Scoring with Optimal Path.
When discussing scoring options it is helpful to investigate the expected values, in this
case the Expected Profit, when considering probability theory in what is essentially a
waging situation. The Expected (Score) for N trials is based on the following formula
Expected (Score) = N (pi) ( si) + N (1-pi) (- si)
With pi as the registered confidence for that instance (or probability), i is from 0 to 100
(increments of 10), s is the calculated score where the maximum possible score is s=1.
NB: N (pi) ( si) calculates the winning component while N (1-pi) (- si) calculates the
expected loses.
Hence a student who registers a confidence of 70 per cent could expect to yield the
following expected score for 100 trials.
86
Expected (Score for p=0.7) = 100 (0.7)(0.7) + 100(1-0.7)(-0 .7)
= 100(.49) – 100(.21)
= 28
OR 0.28 for the average score.
An interesting comparison is the Expected Profit for Gardner-Medwin’s (2006) CBA,
based on the same formula with the variation for the score (.66, converted to be in the
same range of -2 to 1). In this case a student with a confidence level of 70 per cent
would be at the C=2 level, score .66 for correct and -.66 for incorrect). It can be noted
that even though Gardner-Medwin (2006) only uses 3 levels of confidence, C=3, C=2,
C=1 the E (score) calculations will be in increments of 10 per cent, as for the MCQCM
calculations. It is justifiably assumed that a student using a system will record the
appropriate C level of confidence but will have a designated numerical level of
confidence when using the application for any given question. For example a registered
level of C = 3 could have a student’s operational level of 90 per cent or 80 per cent, and
so on.
Expected (Score) = N (pi) ( si) + N (1-pi) (- si)
Hence Expected (Score for p=0.7) = 100 (0.7)(0.66) + 100(1-0.7)(-0.66)
= 100(.462) – 100(.198)
= 26.4
OR 0.264 for the average score.
The Expected values for all probabilities in increments of 10 per cent are shown in
Table 4-8.
The comparison is best demonstrated by the graph shown in Figure 4-5, where the
Expected Score, or E(Gain) for both systems is closely aligned, with the exception of
the E(Score) for the balanced scoring method, which generates negative values for the
lower levels of recorded confidence. This comparable variation is due to Gardner-
87
Medwin’s (2006) safe zone, where a candidate is not penalised for admitting very little
confidence, attributed to his perceived motivating approach to scoring.
Student Confidence MCQCM Expected(Score) Gardner Medwin’s CBA
Expected(Score)
100% 1.0 1.0
90% 0.72 0.7
80% 0.48 0.4
70% 0.28 0.27
60% 0.12 0.2
50% 0.00 0.17
40% - 0.08 0.13
30% -0.12 0.1
20% -0.12 0.07
10% -0.08 0.03
0 0 0
Table 4-8: The Average Expected Scores from MCQCM and CBA.
Figure 4-5: Graph Comparing the MCQCM and CBA Expected Scores.
88
It is observed that the expected gains from the higher levels of confidence registration
are not noticeably different for the two systems, even though Gardner-Medwin’s
(2006) students receive a severe penalty if their answer is wrong. It is comforting to see
that the MCQCM’s E(Score) sits well in comparison to Gardner-Medwin’s (2006)
CBA score.
Investigating the Expected score or gain when a student guesses during a test is
common practice when considering scoring methods. The MCQCM permits one or
more correct answers depending on the choice of the instructors. During this research it
was observed that instructors tended to produce a mixture of single and multiple correct
answers. In all cases the wording of the questions identified them as being single or
multiple correct answer questions, which assisted the student. Consequently the
MCQCM assessment exercises are clusters of true/false answer questions with one
stem, which generates numerous combinations of possible outcomes. While it is
unrealistic to cover all of them here a few fundamental sample responses will be
considered for comparison to the traditional approach to MCQ scoring.
Firstly, the standard Multiple-choice Question (MCQ) with four options, one correct
answer and no penalties for incorrect answers has the E(Score) calculated by
E(X) = 0.25(1) +0.75(0) =0.25
This means that there is a 1:4 chance of the student picking the correct option in which
they are awarded the score of 1 and a 3:4 chance that they will select an incorrect
answer and receive 0. This value is acceptable by many instructors when implementing
MCQ tests. As previously stated, this study does not accept this proposition and this
research’s major objective is to eliminate the noise caused by this activity.
A simple anti-guessing strategy with a correct answer being awarded a score of 1 and
an incorrect answer a score of -1/n-1, where n is the number of options, has the
decidedly modified result.
E(X) = 0.25(1) +0.75(-1/3) = 0
This system is a suitable deterrent to guessing as it adjusts the outcome significantly.
An instructor using this type of scoring option would not encourage their students to
guess during the test unless they were reasonably confident.
89
Paul (1994), Pollard (1985,1986,1993), Pollard and Clark (1989), Davies (2005),
Gardner-Medwin and Clark (2003) and Gardner-Medwin (2006) all accept this
approach to various degrees. While it is not their individually preferred option, it is
acknowledged by all that it at least addresses the issue of guessing, which is preferable
to ignoring the concern. The problem with the simple anti-guessing balanced negative
scoring described above is that it does not encourage the student to express their true
state of knowledge, hence is not motivating.
The balanced scoring method with increments of confidence introduces another
dimension into the arena, in that the concept of wagering adds another layer in the
operation. A student has the option of using an educated guess, tapping in to the partial
knowledge that they have, hopefully minimising the impact of an incorrect choice but
equally important creating the opportunity to secure some marks. The balance of the
reward with the penalty minimises the fear of registering a nominal value to reflect
their confidence in their choice.
The variation to the expected gain for the standard MCQ question as considered above
for a student who has a medium level of confidence, say 60 per cent in an answer, “I
think that it is this one”, for the one option would be
E(X) = 0.25(.60) +0.75(-.60) = -0.30
To add to this they can also use a combination of confidence to offset the negative
component if required as the incremented balanced scoring method is a cluster of
True/False questions that permit the student to register their confidence for each given
option.
A simple example to consider is as follows.
In this case a student is relatively confident (80%) in the first option but also considers
option 2 to have merit (at say 55%), the student not as confident as with option one.
Option 3 they consider to be incorrect with a high level of confidence (100%) and
option 4 they have a reasonably high level of confidence (70%) of being incorrect but
cannot completely dismiss it.
90
If option 2 is correct and the others are incorrect then the score would be
Score = -0.8 + 0.55 + 1.0 + 0.7 = 1.45 out of a possible 4 (total of 1 per option)
= .3625
In comparison, the score for a guess for the standard balanced negative score would
require them to only nominate option 1 as their most preferred answer giving a score of
-1/3.
The expected value for a student with these levels of confidence in their answers would
be:
E(X) = .8(.80) +0.2(-.80) + .55(.55) +0.45(-.55) + 1(1) +0(-1) + .7(.7) +.3(-.7)
= .48 + -.055 + 1 + .28 = 1.715
The score for the question is out of a total of 4, 1 for each, giving an expected return
score of
E(X) = 1.21/4 = .427
Given that the student was only moderately confident in the correct answer (option 2)
and having various levels of confidence in 2 incorrect answers, option 1 and option 4
the grades should be less than a pass. This is a pleasing outcome as a more
representative result for the student’s level of knowledge.
This simple demonstration shows that the incremental balanced scoring method is a
proper scoring system, as Paul (1994) and Gardner-Medwin (2006) promote, as it
rewards the participant proportionally to their level of stated knowledge, whilst
permitting and encouraging the demonstration of partial knowledge.
4.5 Choice and Justification of Scoring Method for this Research
The decision to use the proportionally incremental balanced scoring technique was
made for the following reasons.
Firstly, it was deemed important that the scoring calculations remain simple and in the
direct control of the student. Their actions should result in a consequential score, which
would not confuse or surprise the student. In this case, the concept of laying down a
91
bet, as a means of supporting your choice, was both playful and properly rewarding, as
Paul (1994) and Gardner-Medwin (2006) purport. The use of logarithmic functions and
disproportional penalising can confuse and create too much pressure on the candidate
causing adverse negative effects on the final outcome. As Sim, Read and Holifield
(2008) advocate, the student has the most to lose when sitting the test, hence it is only
fair that they have the control. The comparative analysis of the expected values of the
incremental balanced scoring method to other more complicated methods demonstrate
that the expected outcomes are not significantly different. In addition the author of this
research has concerns about the notion that the measurement of knowledge has the
same traits and attributes as the measurement of information. Paul (1994), Gardner-
Medwin (2006) and Klinger (1997) leverage heavily off the work of Brown and
Shufford (1973) to strengthen their argument, which is based on Shannon’s Information
Theory (1948). Gardner-Medwin (2006) considers the compliance of his scoring to
Shannon’s Information Theory comforting but of the least importance when
considering the constraints that it should adhere to. The author of this research feels the
same, as the relationship between knowledge and information is complex.
Secondly, even though the incremental balanced negative scoring is based on a proper
scoring system that addresses the area of guessing and recognises partial knowledge, it
has been criticised for not being truly motivating. This criticism is valid in that the
optimal path analysis (See Figure 4.4) encourages the option of choosing not to answer
for low confidence, yielding no penalty. However, the students also appreciate the fact
that no gain can be made by abstaining from the activity, which is apparent from the
implementation phase an issue to be further discussed in the Chapters 5, 7 and 8, hence
the argument of students choosing to refrain from committing to answers is not likely.
Finally, the notion of unfair consequences must be considered, as Sim, Read and
Holifield (2008) argue strongly that any CAA package must be seen to be fair in its
application, not providing the student any grounds for appeal. This notion will be
further discussed in Section 6.4.3. The student’s perception of a scoring regime and its
purported fairness is a deciding factor of the choice. The harsh penalising of Gardner-
Medwin (2006), for high confidence (C=3) for an incorrect answer of double the
negative value (-6), although legitimately argued and supported by mathematical
92
modeling and probability theory, is often deemed too severe by both the instructors and
the students. The scoring system by Hassmen & Hunt (1994) referred to in Figure 4-2
might be less threatening. Paul (1994) also produced an alternative method where the
penalty score was a multiple of negative 1.5 of the positive score for a student’s
response demonstrating high confidence in an incorrect answer, but did not promote it
as heavily in his paper. In recent discussions with Paul he explained his position on the
published choice with a detailed synopsis of the justification. He considered the choice
of constants is arbitrary and felt that many educators are most familiar and comfortable
with 0 to 100 scoring. Likewise, many educators are uncomfortable with awarding
positive scores for ignorance (represented by the center region) so the second example
addresses this concern by establishing a score of 0 for that case and 100 for certainty
(which results in the -150 for "completely misinformed").
This conforms with the incremental balanced scoring approach of scoring 0 for no
knowledge, with additional elements similar to Gardner-Medwin’s (2006) severity of
penalty for “completely misinformed”. Gardner-Medwin (2006) recently stated in a
conversation with the author that we fail as teachers if we mark a lucky guess as if it
was knowledge and we also fail if we mark confident errors as if they were no worse
than acknowledged ignorance.
On the adoption of penalising students with negative scores for high confidence in
incorrect answers Paul (1994) further argues that in limited field trials students seemed
quite robust and were able to use the system effectively regardless of the specific
values, unlike teachers and professors who appear to have the most difficulty adapting
to this type of knowledge assessment in the context of their existing administrative and
logistical environs. This observation is shared by Gardner-Medwin (2006), who
declared during a recent Computer Assisted Assessment post conference focus group
his level of frustration in the slow uptake of innovative scoring systems, as there is a
reticence to apply penalties, which in his opinion is an irresponsible approach in
educating students.
Paul through an email discussion concedes that his alternative scoring regimes for
various n and k values have merit, furthermore he intends on pursuing further research
and trials to ascertain the most appropriate method for different applications, both
93
summative and formative. However, his primary focus was to develop a supportive
self assessment tool that adopted the balanced approach to encourage students to
participate in a non-threatening environment, as is the major objective of this research.
4.6 Summary
This chapter has summarised research demonstrating the benefits of an incremental
balanced scoring mechanism that proportionally rewards and penalises students for
correct and incorrect answers, in order to have a richer understanding of the student’s
knowledge, to the benefit of both the student and the instructor.
The incremental balanced scoring method adopted in this study is based on the
arguments and opinions cited in this chapter. It pays homage to the extensive work of
Pollard (198519861993), Pollard and Clark (1989), Paul (1994), Klinger (1997), Davies
(2005), Gardner-Medwin and Gahan (2003), Gardner-Medwin (2006) and the
contribution of others. It especially leverages off Paul’s (1994) declared objective of
using a scoring technique that provides a supportive and non-threatening environment,
given that the initial objective of this research was to develop an assessment strategy to
be used at the discretion of the student and the instructor, depending on its application.
This research does not exclude consideration of the more severe scoring mechanism
proposed by Gardner-Medwin (2006) supporting the recent proposition tabled at an
assessment focus group, that it is irresponsible for an educator to positively reinforce
high confidence in an incorrect answer.
Considering the reticence of instructors to adopt the severe negative penalty option, as
Paul refers to previously, this research uses a scoring system that is more palatable than
that of Gardner-Medwin (2006).
Chapter 5 will present the first iteration of an MCQ assessment tool developed for this
research, that incorporates the incremental balanced scoring method as demonstrated
and justified in this chapter. It will describe the first prototype and the initial pilot
programs designed to ascertain the value of the assessment with confidence
measurement as perceived by both the instructors and students.
94
CHAPTER 5 DEVELOPMENT OF THE MULTIPLE-CHOICE QUESTIONS WITH CONFIDENCE MEASUREMENT (MCQCM) PROTOTYPE AND PILOT PROGRAM
Chapter 4 discussed possible scoring options, nominating the balanced scoring method
based on the incremental levels of confidence measurement for implementation. This
Chapter introduces the tool that was developed to answer the questions posed in
Chapter 2, that is the Multiple-choice Questions with Confidence Measurement
(MCQCM) assessment tool. The initial Visual Basic design of the MCQCM was part of
a previous study for a masters degree. It documents its preliminary development stages
through an iterative evolutionary process, starting with the implementation of a
fundamental working prototype to a group of students and instructors, analysing their
responses to the system and their perceptions of its contribution to the learning and
instructional process. It culminates with some functionality and design
recommendations to improve the MCQCM for extended application.
95
5.1 The MCQCM
It is appropriate at this time to introduce the Multiple-choice Questions with
Confidence Measurement (MCQCM) assessment tool prototype in order to establish a
general understanding of the fundamental assessment principles upon which the
MCQCM is built. The MCQCM version introduced here is the result of an evolutionary
design process of development, as described in the research methodology in Chapter 2,
incorporating the HCI user centered design iterative approach, eventually culminating
in the fully operable Internet based MCQCM to be described in Chapter 6.
The general structure of the MCQCM self-assessment tool is based on the traditional
Multiple-choice Questions format as outlined by Kehoe, Frary, Rodriguez, Tarrant,
consisting of a stem with a number of options (Frary, 1985; Kehoe, 1995; Rodriguez,
2005; Tarrant et al., 2009). It was imperative that the designed system would be easy to
use without placing too much cognitive demand on the user, ensuring that their efforts
were placed on the question rather than the interface. It was also considered important
that the scoring technique was simple, as discussed in Chapter 4, placing the user in
control of the results of their actions. The resulting self-assessment tool would be
required to be developed adhering to good usability design principles (Sharp et al.,
2007), engaging the user while exercising sound navigational properties for delivery
across the Internet, intranet or stand alone. There was the additional requirement that
the system be able to capture and record the scores of the students as they participated
in the exercise for formative and summative assessment.
The MCQCM system is to permit more than one correct answer. This encourages the
student to consider all options separately and not to identify what they consider to be
the single correct answer and ignore the rest or to use a process of elimination. The
system’s feedback to the students is required to be a simple reflection of the student’s
present understanding of the concept being considered in each question. The
advantages considered by this approach compared to the traditional MCQ format are as
follows:
• To permit the instructor to word the options to closely examine the areas of
study, eliminating the need to use easily recognisable distracters.
96
• To force the student to consider all options carefully, increasing their exposure
to associated areas within the topic.
• The score achieved is to reflect an honest position of the student in their
knowledge of the subject.
• To provide formative feedback to allow both students and instructors to redirect
attention where required during the learning process.
5.2 Design of the Rudimentary MCQCM Prototype
At this early stage of development a rudimentary prototype of the MCQCM was used,
far more simplistic than the Web-based operational version to be described in Chapter
6. The initial version of the MCQCM, see Figure 5-1, was developed to cater for these
initial trials.
Figure 5.1: The MCQCM Prototype Developed to Run the Initial Trials.
The MCQCM initial Visual Basic prototype was designed to reflect the student’s level
of understanding of topics as precisely as possible. As this development was part of a
previous study for a masters degree only a synopsis will be covered here. The student is
required to clearly state their level of confidence for each of the answers offered for a
question, knowing that they would be proportionally penalised for an incorrect choice
and proportionally rewarded for a correct choice.
97
The scoring method was adopted after extensive consideration of previous research as
discussed in Chapter 4. A conclusion of this investigation into previous work was that
in order to enable a richer understanding of the student’s knowledge a scoring method
that proportionally rewarded and penalised a student for a correct and incorrect answer
were the preferred choice. The main arguments for this decision was to develop a fairer
system for the student under their control while offering comparable expected outcomes
to other scoring regimes. It was felt that the MCQCM should offer a self-assessment
service that is honest, informative, and directional whilst still being palatable to the
student. The resulting score for a question is calculated dependent directly on a
student’s registered level of confidence for each option, using both positive and
negative values. This is briefly explained in Table 5-1 below with some simple
examples.
Confidence registered for an option Example of score calculation from a
registered confidence
A high level of confidence for a
correct answer for an option yields a
high positive score
A confidence level of 90 per cent for a
correct answer yields a score of
positive 9/10.
i.e. +9/10
A high level of confidence for an
incorrect answer for an option yields a
negative score of equal value.
A confidence level of 90 per cent for
an incorrect answer yields a score of
negative 9/10.
i.e. -9/10
A low level of confidence for a correct
answer for an option yields a low
positive score.
A confidence level of 20 per cent for a
correct answer yields a score of
positive 2/10.
i.e. +2/10
A low level of confidence for an
incorrect answer for an option yields a
negative score of equal value.
A confidence level of 20 per cent for
an incorrect answer yields a score of
negative 2/10.
i.e. -2/10
Table 5-1: Rules and Example of a Score for a Given Scenario.
98
The individual scores for each option allocated out of 10 are tallied to give a score for
each of four questions, resulting in a score out of 40. Each score is displayed as a value
from 0 to 10, or the negative equivalent. As an example, Table 5-2 demonstrates the
resulting scores for a student’s answer to a question with a single correct answer as
highlighted.
Option Instructor’s
Choice
Student’s
Choice
Correct or
Incorrect
Confidence Score
A: i-- False True Incorrect 65 -6.5
B: i++ True True Correct 90 9
C: i=1 False False Correct 100 10
D:i=i++1 False False Correct 92 9.2
Table 5-2: Resulting Score for Options Given the Student’s Choice and Their
Registered Level of Confidence.
The resulting final score for this question is calculated by the addition of the scores for
each option: Option 1+ Option 2 + Option 3 + Option 4 = Total
-6.5 + 9 + 10 + 9.2 = 31.7/40.
In this case the student has incorrectly nominated Option 1 as correct with a 65 per cent
level of confidence. However, they have also correctly identified Option 2 as the
correct answer with 90 per cent level of confidence and further identified correctly the
incorrect answers (Options 3 and 4) with high levels of confidence.
The final test score is calculated by summing each question’s score, as is the normal
practice.
The MCQCM permits the instructor to nominate one or more correct answers if
desired, significantly increasing the level of difficulty for the student to identify correct
and incorrect answers for every question. An example of a question with multiple
answers might be as demonstrated in Table 5-3 below, where both B and C are correct
answers highlighted in the table.
99
Option Instructor’s
Choice
Student’s
Choice
Correct or
Incorrect
Confidence Score
A: i-- False True Incorrect 60 -6.0
B: i++ True True Correct 80 8
C::i=i+1 True True Correct 92 9.2
D:i=i++1 False False Correct 100 10
Table 5-3: Example of a Question, which has 2 Correct Answers B and C.
In this case the student incorrectly identified A to be True with 60 per cent confidence
and correctly identified B to be True with 80 per cent confidence, C to be True with 92
per cent confidence and D to be False with 100 per cent confidence.
Critical to the success of the student’s MCQCM experience was the requirement to
train them in understanding the scoring method. This was achieved by supplying online
interactive demonstrations using the scoring calculator to be used in class. This device
was specifically created to assist the students in understanding the scoring mechanism
of the MCQCM. It permitted the student to simulate the possible scoring scenarios to
see the resulting changes in the scores.
The possible score for a question where a student correctly identifies 3 of the 4 options
is as shown in the score calculator in Figure 5-2, based on the response from the student
demonstrated in Table 5-3.
Figure 5-2: Scoring Calculator for MCQCM Table 5-3.
100
The final score and feedback were given to the student after they completed the test. In
this rudimentary prototype the scores for each question were reproduced on the screen
so the student could see the correct answers with their choices lined up beside them,
similar to the configuration shown in the scoring calculator in Figure 5-2.
5.3 Pilot Studies
Two initial pilot runs are documented here. The first involved 6 students and 3
instructors. The second pilot program conducted was a larger exercise, involving 93
participating students and 8 instructors. These activities produced some interesting and
encouraging qualitative and quantitative data demonstrating that the rudimentary
MCQCM tool had promise as a positive contributor to both the instructor, holding the
primary leadership role, and student as they journey together along the learning path. In
addition these initial findings encouraged further development and studies, as
documented in Chapters 7 and 8.
5.3.1 Aims of Pilot Studies
It was considered beneficial to involve various classifications of stakeholders as part of
the needs analysis and low fidelity user evaluation. The identified participants
consisted of a number of students, instructors, the designer and programmers, as the
designing of the operational interactive system was reliant on them for direction at this
early stage of development.
The activities were designed to elicit answers and discussions on the following
statements to assist in answering the research questions 1 and 3 as outlined in Section
2.2.
1. Was the system easy to operate?
2. Did the feedback display produce comprehensible information in order to be
valuable in directing the student along their learning path?
3. Is a scoring system that penalised for incorrect choices and rewarded for correct
choices in a linear proportionality easy to comprehend?
101
4. Would the participant actively use the sliding bar to register their level of
confidence freely and would they perceive the system as being either too
complicated or too threatening?
5. Would a self-testing program of this design favour a particular learning style?
6. Would students consider the proposed system might be more favourable to the
extraverted individual and disadvantage the introverted user?
7. Would students consider the proposed system to be gender bias?
5.3.2 First Pilot Study
The initial small pilot study main objective was to elicit as much user feedback as
possible from the student’s and instructor’s experience of using the basic MCQCM
prototype.
To optimise the effectiveness of the pilot run, Visual Basic was used to construct the
interface, creating individual data tables for each participant. This permitted the
responses from each of the participants to be captured and later displayed to students
and instructors as a means of reflecting on their progress and experiences.
A small group of 6 students (3 males and 3 females) and 3 instructors (2 males and 1
female) were asked to participate. After completing the test the students were required
to answer a series of questions in the presence of the designer. The subject instructors
were also interviewed after the students completed the tests to ascertain their opinions.
(Appendix A)
To ensure the richness of the information collected the participants were invited to
consider the system over a period of days and encouraged to give additional feedback
after ongoing reflection.
The 6 students were required to complete 5 test questions concentrating on a particular
content area, which was to be formally tested within their classes. Hence the students
perceived it as advantageous to their study and were consequently keen to participate.
To assist in the initial exposure to the system the students all participated in an
introductory session that demonstrated the package in a non-threatening manner. The
students were initially required to respond to a general question set in their social
102
environment where the question addressed a local, major, well-publicized, sporting
event. Immediately following this, the students were required to complete the further 5
questions designed to ascertain their knowledge of the nominated content area.
Student Observations
The initial introductory demonstration session permitted the students to participate in a
social context without threat, a practice encouraged by Paul (1994), Gardner-Medwin
(2006). This proved to be a successful exercise as they responded to the system in a
relaxed manner. However, it was observed that it did not eliminate their fears or
apprehension completely during the actual assessment exercise.
The students were then asked to complete the more formal part of the pilot program,
which was a series of 5 test questions constructed around the content area of their
studies.
All of the students completed the five test questions without any real operational
concerns. It was observed that during this formal part of the pilot program there was
still initial apprehension in using the system for the first time. The students approached
it with some suspicion and concern. From their verbal protocol during testing, it was
observed that they were not altogether comfortable with the interface, as they were not
familiar with it under test conditions. Students also identified the additional anxiety of
being closely observed during a test, and expressed concerns about being required to
identify not only what they considered to be the correct option but also what they
considered to be the incorrect answer. To add to their initial apprehension they
demonstrated hesitation registering their level of confidence in all of their choices.
In response to the questions posed in Section 5.3.1 in general the students found the
MCQCM easy to use, feeling comfortable with the scoring technique and the
operational aspects. They did however show concern about the sliding bar functionality
of registering both confidence and choice of answer.
All of the students requested more opportunities to use the test as they considered it to
be greatly beneficial in confirming their knowledge in some areas and highlighting their
inadequate knowledge in others.
103
All of the students registered that they understood the feedback, as it clearly stated their
responses.
The simple scoring system was well received. All of the students claimed to
understand the method of calculating the score and consequently would react depending
on their level of confidence to maximise their result.
It was observed that the students tended to minimise the use of the slide bar for the first
few questions and increased the usage for the remaining, where usage increased as the
student became more relaxed.
The pilot program was not broad enough to give any valid feedback pertaining to the
bias towards particular learning styles and gender. The students could neither
demonstrate nor comment on these issues during their short exposure. This is an area
for consideration at a later stage and with a larger cohort of students.
Table 5-4 provides a summary of both the students and instructors.
Instructor Observations
As the instructors exposure to the system at this early stage was only brief their
contributions were minimal, based on their observations of the prototype being used by
the student’s and exposures to a series of representative summary results for their
consideration, also included in Table 5-4 in a summary of comments from both the
students and instructors.
It can be observed that the feedback to the MCQCM prototype at this early stage was
promising being generally well received by the participants in the pilot program. All of
the students appreciated the opportunity to use the testing facility and they all
considered it to be beneficial to their preparation for the oncoming test. They
considered the method of scoring to encourage risk-taking and may also permit the
students to manipulate the system to their advantage.
At this early stage some students registered a concern that the process of decision-
making could confuse the participant. This was not overly apparent during the pilot
program but became a significant issue during later extended trials.
104
Initial Questions Student Observations and
Discussions
Instructor Observations
and Discussions
Was the system easily operable? Hesitation is using the slide bar, slight confusion in the sliding mechanism to register confidence and choice of answer
Appeared to be easily operable
Was the feedback display produced comprehensible in order to be valuable in directing the student along their learning path?
All students felt that the feedback was clear
Appeared to be clear to the student and the proposed reports to the instructor would be beneficial
Do you think that a scoring system that penalised for incorrect choices and rewarded for correct choices in a linear proportionality is easily comprehended?
The simplicity of the system was understood, the students did realise that not answering in some case would be beneficial
Appeared to be understood by the students, some hesitation in offering it as a legitimate scoring mechanism
Would the participant actively use the sliding bar to register their level of confidence freely and their general perception of the system being either too complicated or threatening?
Initially little use of the sliding bar, extended use as they progressed through. High level of initial apprehension alleviated as the test progressed
Concerns that might be too threatening having to identify correct option and also incorrect ones. Concerns about unfair consequences and possible appeals
Would a self-testing program of this design favour a particular learning style?
No real concern registered, no participant felt disadvantage when using the system
Could be more favourable to the student who prefers to learn by experiential methods, trying out different ways etc.
Do you consider the proposed system might be more favourable to the extraverted individual and disadvantage the introverted user?
No real concern registered, no participant felt disadvantage when using the system
The experience of the instructors was that this would suit some students more than others. They thought that the over-confident would overstate their confidence while the more timid would understate.
Would the system have gender bias towards males? Do the instructors in their personal experiences observe that males have a tendency to overstate their ability while females often understate?
Did not register any opinion Some of the instructors also thought that it might assist female students. As it gives them the opportunity to show levels of knowledge without fear of embarrassment
Table 5-4: Pilot Program Student and Instructor Observations.
105
Also apparent, are the concerns registered by the instructors, including the possible
favouritism towards the extraverted student and the fear of appeal for perceived unfair
penalties.
As a direct result of these observations and the discussion above, a second, more
extensive pilot program was carried out in an attempt to obtain a deeper understanding
of some of the cognitive processing issues flagged in this initial, smaller pilot study.
5.3.3 Second Pilot Study
The encouraging results of the small pilot study initiated a further, more comprehensive
pilot study. The primary objective of these studies was initially to develop and evaluate
the MCQCM as an innovative formative assessment tool, to determine if it is beneficial
to the student and the instructor, being the key identified stakeholders. These studies
were designed to ensure that the stakeholders were given the opportunity to interact
with the system at an operational level, producing both qualitative and quantitative data
for analysis and interpretation. This part of the research was approached in two closely
related stages, Stage 1, student evaluation and Stage 2, instructor evaluation.
Stage 1, the initial and major section of the experiment, was based on a series of trials
with two individual cohorts of students using the system as part of their learning
experience. All data was recorded either directly to a database or indirectly via the
subject review questionnaire. It was considered advantageous to collect the data by the
two means, as the encompassing technology is ideal for collecting the raw data and the
hand written questionnaire format permitted the students to respond away from the
computer environment, giving the opportunity for reflection and further thought.
Stage 2 of the experiment investigated the instructor’s evaluation of the system,
attempting to gauge their perceived value of the MCQCM as both a formative and
summative assessment tool. The second instructor’s experiment was directly dependent
on the student’s experience as the generated recorded data was analysed and presented
to the instructors for their opinions.
106
5.3.3.1 Stage 1: Method and Results for Student Focused Experiment
Two groups of students participated in the experiment to be referred to as Cohort 1 and
Cohort 2. The initial cohort of 50 students (Cohort 1) consisted of undergraduates
enrolled in the Tertiary and Vocational Training Education (TAFE) Computer Science
course. The second cohort of 43 postgraduate students (Cohort 2) was enrolled in the
Higher Education (HE) Graduate Diploma of Information Technology. The modules
being tested were core subjects of both courses being the TAFE Introduction to C++
and the HE Database 1 (Entity Relationship Modeling Design and Structured Query
Language). It was considered that testing the MCQCM at various levels of the
educational spectrum would enable richer data in relation to usability and perceived
usefulness of the system.
5.3.3.2 Outline for Cohort 1
The undergraduate TAFE students participated in the self-assessment exercise as part of
their preparation for a scheduled summative assessment task. The students were
encouraged to complete the assessment without peer consultation and were informed
that the test results would be anonymous. Each student was assigned a unique number
that referenced their scores, and responses to the post session questionnaire, giving
them complete anonymity.
The MCQ test consisted of 5 questions addressing the fundamental concepts of
programming in C++. In general the format was a stem that referred to particular
programming desired outcome with the options containing the program segments to
successfully achieve that desired outcome. Of the 10 questions 5 of these provided
more than one correct answer in the four options given. Of the 50 participating students
all responded to the posttest survey as it was completed in class as a part of the standard
testing review process.
5.3.3.3 Outline for Cohort 2
The second cohort of 43 HE postgraduate students used a slightly modified Web-based
version as part of their normal revision program in preparation for the final exam.
Unlike the undergraduate students, these students accessed the self-assessment test
from their preferred study environment, either in their homes, in the laboratories, at
107
work, or in the library. The structure of the questions were the same with a stem and 4
options from which to choose, with greater than 50 per cent of the questions containing
more than one correct answer. Of the 43 participating students 20 responded to the
optional post-test survey presented as part of the MCQCM, presented on the screen at
the final stage of the test, or available to them via the Internet at a later time.
5.3.3.4 The Post Test Questionnaire
The student questionnaire consisted of 3 general background questions regarding the
age, sex and computer experience of the participants. The questionnaire (Appendix A)
contained a further 9 questions relating directly to the MCQCM self assessment tool
addressing the following issues:
• Do the students accept the system as both a summative and formative
assessment tool?
• To what degree would they use the MCQCM via the Internet?
• Does the resulting feedback from the MCQCM have a direct influence on their
learning path?
• Do they feel well informed about their level of understanding of the subject and
the areas in need of revision after using the MCQCM?
• What is their opinion of the benefits and perceived problems with the MCQCM
system?
These questions were to contribute to the research questions 1 and 3 as outlined in
Section 2.2.
5.3.3.5 Data Collection for Stage 1: Student Focus
There were two distinct components of data collected by the MCQCM system from the
students. The first collected set of data was the actual score of the participants recorded
during the test and referenced by the unique identification number assigned to each
student. At the end of the test the recorded data was regenerated, presenting on the
screen a graphical display of the student’s scores for each question and a total score for
the test.
108
The second collection of data was both the quantitative and qualitative data from the
questionnaires completed by each student.
5.3.3.6 Analysis of Collected Data from Stage 1: Student Focus
This section of the analysis considers the demographics of the participants gathered
from the general background questions.
Cohort 1 of 50 students tested, consisted of 44 males and 6 females, with Cohort 2 of
43 students consisting of 31 males and 12 females, as demonstrated in Figure 5-3. This
gender imbalance can be attributed to the Computer Science course presently attracting
a substantially greater number of male participants.
Figure 5-3: Age and Gender Distributions for Both Cohorts of Students.
It is observed from the graph above that the greater proportion of Cohort 1
undergraduate students were in the age range of 18-25. This can be attributed to the fact
that the main feeder for this course is from the secondary education sector. Whereas,
the greater proportion of Cohort 2 postgraduate students are in the age group of 30 and
above, as it is a requirement for a student to have completed an undergraduate
qualification to be accepted into this particular course.
The level of computer experience was recorded as either being none, casual or
proficient, with 64 per cent of the students classifying themselves to be proficient and
the remaining 36 per cent classified themselves as being casual. There were no students
that registered their experience as being “None”. It would be expected that students of
Computer Science would classify themselves as being proficient, however it is
understandable that their perception will differ from student to student.
44 31
6 12
0 10 20 30 40 50 60
Cohort 1 Cohort 2
Gender Distribution of Cohorts
Females
Males 0% 20% 40% 60% 80%
Cohort's Age Distribution
Cohort 1
Cohort 2
109
5.3.3.7 Summary of Student’s Questions About MCQCM
In answer to the questions outlined in Section 5.3.3.4 all of the undergraduate Cohort 1
students registered an appreciation of the system at various levels as a valuable part of
their learning process, with 54 per cent registering the system to be approaching
extremely helpful. Similarly, 95 per cent of the Cohort 2, the postgraduates, considered
the self-assessment tool as valuable with 20 per cent registering it as extremely helpful.
Furthermore, a pleasing 100 per cent of Cohort 1, the undergraduate students and 98
per cent, i.e. all but one of Cohort 2 students, stated that they would use the system to
various degrees during their studies if it were available.
The students from both cohorts registered a desire for the system to be delivered via the
Internet, stating that provision of instant, private feedback and support in a self-paced
learning environment was of benefit. The students also stated that the Internet delivery
option created a freedom with flexible delivery, permitting the utilisation of community
houses, libraries and other educational facilitators in the community.
It was observed that 96 per cent of students from both cohorts registered that the
feedback provided them with more information about their understanding of the area
being tested, stating that the system appeared to honestly demonstrate their acquired
knowledge at any time during the learning path.
A high 90 per cent of students from both cohorts registered that they knew which areas
they should be revising after completing the test, feeling that the system assisted in
identifying what they needed to learn and revise.
Ninety five per cent of the students from both cohorts considered the system would
influence their path of learning during their studies if it were available and 48 per cent
of the undergraduate and 25 per cent of the postgraduate students stated the feedback
from the system would have a substantial to significant influence on their learning path.
Additionally, 95 per cent of the students from both cohorts recorded that they
considered the feedback was better than the traditional Multiple-choice format, but still
consider the traditional format to be a valuable tool for self-testing.
Some supporting documented benefits stated by the students were as follows.
110
• This system could “increase the students’ level of confidence showing how much
they are right or wrong and giving more specific information on their level of
understanding of particular concepts”.
• This system could “assist in the elimination of guessing”.
• This system caters for the “maybe” option where the student is not too sure of the
correct choice.
5.3.3.8 Student Observations According to Age Groups
Further analysis of the data with respect to the age group produced the following
interesting observations. These age classifications are irrespective of Cohorts 1 and 2.
18-25 Yrs Age Group:
It was observed that all of 18 to 25 year old students rated the program highly and
showed a strong trend towards using the system regularly. While the issue of whether
the system would influence their direction of study is not of great significance, the
students indicated that they tended to favour this system above the traditional MCQ
method.
Additionally these students felt that the system identified what they would need to learn
in a self-paced, quick and easy format, providing proficient revision of the subject area.
Some of the concerns expressed by this age group were:
• The system would benefit from an explanation area in the feedback to fully explain
the reason for the correct answer.
• The elimination of guessing was still not complete.
They did not register concern with issues such as access to computers, partly because
computers are readily available to them. They also requested this style of self-
assessment tests to be part of their daily study routine. They appreciated the quick
response of instantaneous feedback, however they would like explanations with their
feedback with references to resources.
26-30 Yrs Age Group:
There are some interesting trends observed from the data generated specific to this sub
group of students. All of these students appeared to appreciate the system and show an
111
enthusiasm towards using it on a regular basis. There is strong evidence that the
students felt that the system would influence the direction of their study but not to a
great degree.
This group also strongly registered the benefit of being able to pursue the “maybe”
option as part of their learning strategy as well as acknowledging the convenience of
having a self assessment available to them via the Web as part of their home revision
strategy.
However, this group of students expressed some concerns regarding access to
computers, as unlike the younger students computer accessibility is not so readily
available.
The students also voiced some concerns about the requirement of using the slide bar as
a means of registering confidence.
Most of the students in this age group are generally employed while completing their
course. Consequently any system that permits them to evaluate their understanding of a
topic at a time and place that is convenient to them is considered to be of benefit.
30+ Age Group:
It is very difficult to draw any conclusions or trends from these students with such a
small population, but there are some information and trends worth noting. All of these
students rated the system highly and acknowledged that they would use it regularly.
Some students in this group considered it to influence their direction of learning greatly
and preferred it to the traditional MCQ format.
Consistent to the observations of the previous age groups they also declared that they
appreciated the system being readily available via the Web providing instant feedback,
however they too have concerns regarding the access to computers.
These older students often seek employment during their study and voiced that Internet
based self-assessment systems generally permit them to evaluate their understanding of
a topic at a time that is convenient to them. Similar to the younger students they also
requested more explanations to the answers with additional guidance in their direction
of study.
112
5.3.3.9 Analysis of Recorded Scores for Students Observations
The MCQCM prototype had the ability to record the students overall scores for each
question, ranging between –40 and 40.
The resulting graph of the frequency of the scores for the Cohort 1 undergraduate and
Cohort 2 postgraduate students is shown in Figure 5-4 and Figure 5-5 respectively.
Figure 5-4: Frequency of the Undergraduates, Cohort 1, Scores for Each
Question.
Figure 5-5: Frequency of the Postgraduates, Cohort 2, Scores for Each Question.
0 5
10 15 20 25 30 35 40 45
-40-31 -30-21 -20-11 -10-1 0 1-10 11-20 21-30 31-40
% o
f stu
dent
s
Scores
Quest 1
Quest 2
Quest 3
Quest 4
Quest 5
113
Figure 5-4 and Figure 5-5 indicate that the scores tended to be skewed towards the
higher marks, with the greater percentage of students receiving an accumulative grade
for each question in the positive. This result could be interpreted as being very
supportive of the student and a valuable method of building their confidence, while also
identifying their weaknesses. Some of the students stated that “the good thing about the
system was that it permitted an allocation of marks to areas that they were not too sure
about” and consequently boosted the total score accordingly.
It was also observed that the Cohort 1: undergraduate students used the slide bar 41 per
cent of the time. Although this is not a high percentage some students mentioned that
the use of the slide bar was foreign to them and felt that they would use the option more
with extended use. In contrast, the Cohort 2: postgraduate students used the sliding bar
only 12 per cent of the time preferring to set the confidence at 100 per cent. The
tendency for the younger undergraduate students to use the sliding bar more often than
the postgraduates could be attributed to their long-term exposure to technology and
being less inhibited when confronted with the device. However, both cohorts also stated
that they would use the slide bar more after prolonged exposure. In addition it was
observed that many of the students registered concern with using the slide bar as a
means of registering both their choice and their confidence of their choice. This issue
influenced the redesigning of the MCQCM greatly, as discussed in further detail in
Chapter 6 dedicated to the redesign of the MCQCM.
5.3.3.10 Stage 2: Method and Results for Instructor Focused Experiment
It was considered to be extremely advantageous to gather information regarding the
Instructor’s perception and opinion of the system at this early stage of development.
The instructors directly involved with the Cohort 1: undergraduate students were
invited to respond to the system.
5.3.3.11 Outline of Method for Instructors Focused Experiment
A combined total of seven instructors were interviewed for the first and second cohorts
of students. Three of the instructors were strongly associated with the Information
Technology area while the remaining four were from the Electronics, Physics and
114
Mathematics areas. All participants hold formal qualifications in their teaching
discipline areas as well as a formal qualification in Education, with at least three years
teaching experience.
Initially the instructors familiarised themselves with the MCQCM tool. During this
introductory exercise they were encouraged to ask questions for clarification to ensure
that they understood the operating processes involved and the scoring system. After the
demonstration they were shown a series of graphs generated from the recorded data
from the student experiment, as previously displayed in Figures 5-4 and 5-5, the first
displaying the frequency of the overall scores for each question.
The instructors then viewed a series of graphs showing distribution of the individual
scores for each of the questions and the graphical presentation feedback screen the
students received at the test conclusion.
At the completion of the MCQCM demonstration and results display the participating
staff members were asked to complete a questionnaire (Appendix A) that addressed the
following questions to contribute to answering the research questions 1, 2 and 3 as
outlined in Section 2.2.
• Do they consider this to be a useful tool?
• Would they use this tool for the duration of the subject?
• Would they construct the answers differently for this type of MCQ format to
enhance the feedback?
• Would the resulting feedback influence their instructional path?
• Would they use this tool for summative assessment?
• What concerns do they have about using the system for self-assessment and/or
grading a student?
• Could using this type of scoring mechanism offer a more refined set of results,
permitting the instructor to differentiate between the grades of students?
5.3.3.12 Analysis of Collected Data from Instructors
The instructors were required to complete a questionnaire after viewing a
demonstration of the system in operation with a summary of the captured data. The
series of questions enquired about the perceived advantages and disadvantages of using
the tool as both for summative and formative assessment.
115
All of the instructors responded that they considered the MCQCM’s feedback would
influence the direction of their teaching to the benefit of the students as it clearly
identifies the areas of concern.
In addition, they all confirmed that using the MCQCM would influence their question
construction to maximise the benefits, permitting the students to display knowledge,
possibly producing a more granular feedback. Additionally, they recognised the value
of varying the question format to increase the effectiveness of the tool. However, all of
the instructors voiced concern regarding the additional workload required to construct
questions in this format.
The instructors considered MCQCM worth pursuing in general but some had
reservations with using it for summative assessment. Those with reservations consider
this type of system would only be beneficial if the students have a clear understanding
of the scoring mechanism and would require sufficient training to be fully effective and
influential.
The MCQCM was considered to be advantageous in assisting to produce a more
discerning grading, supporting the decisional process in the dilemma of borderline
grades where the scores are positioned on the boundaries of pass to fail or higher levels.
The MCQCM was particularly appreciated for the application in the area of vocational
training (TAFE), where the objective is to award the student with a competent or not
quite competent grade (Training Packages for Competency Based Assessment).
There were severe concerns about the system’s tendency to favour the more self
confident, extroverted and disadvantage the less confident, introverted individual.
The staff acknowledged that they would consider using this tool both on student
demand and at the instructor’s discretion, with a preference to using it as a class
activity.
All of the instructors considered the tool to be of greatest value as a formative self-
assessment as part of the student’s revision. They did however express concerns about a
full implementation suggesting that further trials occur as the tool is refined to ensure
that it does what it purports to do.
116
5.4 Discussion
In general the MCQCM prototype application was well received by both the students
and instructors who participated in the pilot program. The students appreciated the
opportunity to use the self-assessment MCQCM and considered it beneficial in their
preparation for the oncoming assessment task.
The students considered the MCQCM to be easy to operate and the method of scoring
appeared to encourage risk-taking, resulting in the students manipulating the MCQCM
to their advantage. The students increasingly used the sliding bar with more exposure
to register their confidence and confirmed that the feedback and consequential score
was both comprehensible and helpful for further study direction.
The initial introductory session required the participants to use the tool in a non-
threatening environment, which proved to be a successful exercise as they responded to
the MCQCM in a relaxed manner. It was important to the success of further work that
this introductory exercise be used as it assisted in establishing the MCQCM as an
acceptable assessment tool.
This research identified a number of concerns to be addressed during the ongoing
MCQCM development. It was apparent from their verbal protocol during testing that
the students were not comfortable with the interface, due to the lack of familiarity and
most importantly the failing of the prototype to adhere to the HCI design principles of
good navigation, consistency of visual presentation and error management. The
students also expressed concerns about being required to identify not only what they
considered to be the correct options but also what they considered to be incorrect. Some
students demonstrated hesitation registering their level of confidence in all of the
options and tended to only do so for the correct options.
In the positive light, the simple scoring system was well received by both the
instructors and the students. During the interview process all of the students claimed to
understand the method of calculating the score and consequently would react depending
on their level of confidence to maximise their result. Furthermore, all of the students
stated during the interview that they understood the feedback, as it clearly reflected
117
their responses. They also agreed that that the feedback would assist them in deciding
their study path to improve their understanding of the topic.
It was observed, and confirmed during later interviews, that the students tended to
minimise the use of the slide bar for the first few questions. However, as the student
became more relaxed with using the MCQCM, the use of varying confidence levels
selected on the slide bar increased.
All of the students requested more opportunities to use the MCQCM as they considered
it to be greatly beneficial in confirming their knowledge and highlighting their
weaknesses in the topic covered.
A major concern identified by the students was the use of the sliding bar to register
both their confidence and choice of the correct answer in one action (see Figure 5-1).
This operational component of the MCQCM needed to be addressed and is further
discussed in Chapter 6 which focuses on the redesigning of the MCQCM.
The MCQCM demonstrated the attributes of a good formative assessment tool which
should encourage students to evaluate their understanding of a topic before it is too late.
In addition, it must be considered by the students as a non-threatening, non-
discriminatory support to their learning, with the resulting feedback benefiting both the
student and the instructor.
This preliminary part of the research using the MCQCM prototype produced some
encouraging evidence to support the use of the MCQCM and the utilisation of
confidence measurement as a formative assessment strategy. Throughout the pilot study
the students and instructors demonstrated an appreciation of the MCQCM as a means
of revision, especially with the advantage of it being readily available via the Internet.
The results appear to support the use of the MCQCM as an effective formative
assessment tool for the student on a regular basis, permitting them to independently
self-evaluate their state of knowledge at any stage of the educational program. The shift
of control of the learning program to the student was well received by both the students
and the instructors. However, concerns about the possible favouring towards the more
confident, extroverted student were worth noting and further investigation.
118
5.5 Further Development of the MCQCM
The success of this initial exercise encouraged further investigative work in this field to
further develop and refine the MCQCM into a user-friendly, HCI compliant, Web-
based format capable of delivery via the Internet. This new version would then be
available to a broader range of students and instructors within the faculty, encouraging
the integration of the MCQCM as an extension of the learning program over the full
duration of the semester.
At this stage of the research it was decided that the mode of application of the MCQCM
should also be investigated, having some groups using the system at the instructor’s
discretion while others would have it readily available to them on demand. As
anecdotal evidence suggests that traditional MCQs may not be a reliable tool to
measure student’s knowledge, with questions about the validity of using it as a
summative assessment tool, it would be beneficial to investigate the suitability of the
MCQCM for summative assessment, observing whether the method of scoring
produces a more granulated set of results than that of the more traditional assessment
strategies.
The ongoing concern, voiced by some of the teaching profession, of disadvantaging
certain groups in our community due to our choice of assessment could be evaluated by
investigating the application of the MCQCM to accommodate various learning styles
and personality traits, such as the extraverted versus introverted.
The aforementioned concern of using the sliding bar as a means of registering both
confidence and choice was deemed to be a critical flaw in the design of the MCQCM
requiring immediate attention.
5.6 Summary
This chapter has reported on the findings of two independent pilot runs that were
designed to elicit both the student’s and instructor’s perception of using the MCQCM.
This chapter further identified some critical areas of concern as a result of the pilot
trials with the MCQCM. While there was a general satisfaction with the use of the
system, many of the students felt that there were problems with its general usability,
119
particularly the use of the sliding bar as the primary method for registering both
confidence and choice.
The resulting findings have been thoroughly analysed in preparation for the next stage
of this research, the redesigning of the MCQCM. Chapter 6 documents the redesign of
the MCQCM in keeping with Human Computer Interaction design principles and best
practices.
This chapter answered the research question formulated to evaluate the instructor and
student’s perception of assessment with confidence measurement for formative
assessment: Does Assessment with Confidence Measurement produce more meaningful
feedback when used for formative assessment?
In particular it addressed the research sub questions:
Q1A: What are the student’s and instructor’s attitudes and perceptions of assessment
with confidence when used for formative assessment?
Q1C: Does the use of assessment with confidence measurement provide additional
valuable feedback to the instructor when used for formative assessment?
In addition it also contributes to addressing the third research question:
Research Question 3:
What are the design requirements for developing an interactive assessment with
confidence measurement to ensure that instructors and students are able to achieve
maximum benefit from the system?
At this early stage the qualitative data identified usability areas of concern to be
addressed before broader implementation. One of these being the cognitive overload
caused by the single action of sliding the bar for both the choice of an answer as either
being correct or incorrect while also registering the level of confidence. Additionally
other areas of concern identified where the navigational component of the system, the
error prevention, error recovery strategies and the method of displaying graphics with
limited screen space. These concerns and others are addressed in Chapter 6.
120
CHAPTER 6 DESIGNING AND REFINING THE MCQCM FOR DELIVERY VIA THE WEB
In Chapter 5 the initial pilot studies of a rudimentary standalone prototype was
discussed. These pilot studies were designed to ascertain if the assessment with
confidence measurement strategy was worth further investigation. The positive
response from these initial trials were encouraging, leading to further activities
requiring the development and refining of the more sophisticated version of the
MCQCM for implementation across a number of subjects. This chapter addresses the
issue of the confusion in registering confidence with choice by introducing Bandura’s
(1977) work on self-efficacy and its recent applications in the same domain ((Moos &
Azevedo, 2009).
A contributing factor to the design of the MCQCM is game design and this chapter
identifies those components that have an important input into the interactive
educational environment, aligning the attributes of the MCQCM to them. In particular
it considers the design and usability elements that contribute to the game play
experience for educational interactive systems.
This chapter concentrates on the design, development and refinement process to
produce the Web-based version of the MCQCM tool at an acceptable operational level.
The chapter includes the documentation of the evaluation of the revamped MCQCM
against a set of customised usability heuristics (Sim, Read, & Cockton, 2009) designed
to gauge the usability of interactive assessment tools that is critical to the success of
educational interactive system implementation. This activity culminates in the extension
121
of these computer assessment heuristics applicable to the development of interactive
assessment with confidence measurement systems. Finally in this chapter there is a
need to provide a solution to the challenge of displaying large areas of information on
limited workspace (Leung, 1995), as the use of diagrams and programming script often
requiring large display areas.
122
6.1 Games Taxonomy
The strong association between computer games and educational interactive systems is
not a mere coincidence; it is the result of careful design.
The student of today is surrounded by technology (Prensky, 2003) and understandably,
educational interactive tools often leverage off the games phenomenon, borrowing
many of its themes and functionalities from there (Adams & Rollings, 2007; Baird &
Fisher, 2006).
The uptake of multimedia applications in the educational arena was swift and extensive
due to many educators quickly identifying the benefits to be gained by using this
medium. Many innovative approaches and applications are founded on the evolving
interactive games paradigm, which engulfed the world soon after the introduction of
desktop computers. It is appropriate at this time to discuss some of the more relevant
components of fundamental game theory as this research has a reliance on the
interactive games topology, using the gaming betting metaphor, with its contribution to
the intrinsic motivation as mentioned in Hede’s (2002) model (Section 3.11), pivotal to
the success of the assessment with confidence measurement experience.
As discussed in Section 1.4, assessment strategies are required to meet a set of criteria
to be considered to add value to the learning experience. Similarly interactive
assessment tools also need to meet a set of game play criteria to be beneficial to the
student and the instructor. These criteria include the ability to challenge the participant
to achieve a set of predefined goals adhering to a set of rules within an environment
that encourages risk-taking with a perceived sense of fairness. In order for this to be
understood there is a need to investigate the relationship between games and interactive
educational tools and fundamental games topology. The following section considers the
relevance of game theory to education, then identifies the fundamental set of criteria to
determine if an interactive educational game can be considered of sound design and
practice.
123
6.2 Game Theory Relevance to Educational Games
Prensky (2003) states that this new generation of students are exposed to multimedia
imagery thousands of times a day, in fact they are sacturated in digital media, and
argues the observation of Malcolm Gladwell (cited by Prensky 2003), that we can
educate children if we can hold their attention. It is for this reason that many work
related interfaces are now emulating game type interfaces to encourage engagement
with these play preference individuals.
Amory (2007) promote the Game Object Model (GOM) that marries the Educational
Theory with Game Design in order to facilitate the production of advanced learning
environments, supporting the relationship between learning, playing and story (Amory,
2007). Constructivist Educational Theory (Kaufman, 2003) relies on development and
deep understanding that is actively built up by the learner through their learning
experiences. The critical attributes of constructivist education are the ability to explore,
have social discourse and to play (Amory & Seagman, 2003). Quin (2005) and Rieber,
(1996) state that game play is a strategic part of learning and performs important roles
in psychological, social and intellectual development, a voluntary activity that is
intrinsically motivating(Quinn, 2005; Rieber, 1996).
The process of figuring out the rules of a dynamic representation is known as inductive
discovery (Prensky, 2003). Today’s students see computer skills as a second language,
or even stronger, it is their native tongue (Baird & Fisher, 2006; Prensky, 2003). Part of
their vernacular is based on the phenomena of being prone to be active rather than
passive in their educational approach when given the opportunity. The student
interacting with a multimedia application is fearless, as they assume that “software is
supposed to teach you how to use it” (Prensky, 2003). Students often approach the
problem solving as they do games, rapidly and in an exploratory manner to achieve
positive outcomes. An important component of good game-play is the requirement that
during the early cognitive stage, a good instructor will call attention to the cues, giving
diagnostic knowledge of results and shaping the behaviour of the participant by
affirming positive results with appropriate feedback (Bradshaw, 2007).
124
6.2.1 Fundamental Game Theory Criteria
Adams and Rollings (2007) define a game as having distinctive elements as part of
their structure. These elements distinguish a game from a toy or puzzle. “Play” is the
act of self-entertainment usually connected with toys, puzzles and games. It is the
inclusion of rules and goals, which determine the type of play in which we engage. A
game without rules and goals is a casual experience to be completely interpreted by the
participant. Adams and Rollings (2007) define game play in terms of the challenges and
the actions underpinning the experience.
The inclusion of both rules and goals increases the formal structure of the experience
and distinguishes the activity as a game. Adams and Rollings (2007) further define
game play as a combination of two concepts: being
• the challenges that a player must face to arrive at the object of the game
• the actions that the player is permitted to take to address those challenges.
They consider challenges and actions to lie at the heart of games design, as the
challenges and actions are created and combined together to enhance the experience.
For a game to be successfully designed one cannot exist without the other, in that you
cannot set challenges without appropriate action to surmount them, and you cannot
have actions without relevant challenges for them to address.
6.2.2 The Goals and Rules of a Game
As previously stated, a game must have a goal or a number of goals. Goalless play does
not comply with the definition of game play as even the less demanding games have a
goal. Salen and Zimmerman (2003) require a game to have a quantitative outcome, by
which the measure of success can be attributed. No matter what the goal is, it must not
be trivial, as the challenge laid before the participant is reliant on the defined goal
(Salen & Zimmerman, 2003). This is particularly important to the interactive
assessment with confidence, as games of chance are dependent on the learning and
understanding of odds to optimise the scoring benefit. The reliance on odds alone
(tossing of a coin) does not necessarily constitute a game as there needs to be
participation of the player as part of the challenge. The termination point of a game
125
occurs when the goal has been addressed to the best of the ability of the player. In many
cases this is usually when the victory condition has been met, that is when the challenge
is over. It is at this time that the game experience transcends from the pretend
environment into the real world as the results can be of material benefit and meritorious
achievement, in the case of education this can be attaining formal grades for an
assessment task.
The game rules are the instructions, restrictions and definitions that make up the agreed
conditions of play. Some rules are explicit, being clearly stated up front, while others
are implicit, unwritten and taken for granted. The rules establish a contextual
framework by which the game is played out giving permission for various actions and
denial of others.
6.2.3 Game Fairness
There is a general expectation that all games are fair. The interpretation of fairness is
greatly influenced by society, the individual and other contextual factors. The concept
of fairness is external to the game as the players sit within their cultural settings
defining the rules of their existence while interacting with a mutually exclusive
imaginary environment, not necessarily governed by these external rules. Rules can be
categorised as either Mutable (changeable) or Immutable (non changeable). The greater
the proportion of immutable rules the fairer the game must be. Interactive assessment
with confidence has the vast majority of its rules as immutable and is highly reliant on
being perceived as being fair. Symmetric games are those that have the same rules for
all players. This is a rudimentary requirement for a game to be perceived as being fair.
For the majority of educational assessment there is an obligation for the structure to be
symmetric to be perceived as fair, as any non-conformity would instantaneously deem
it as unfair.
6.2.4 Games Risk and Rewards
Risks and rewards have been a source of entertainment having their roots in the age-old
practice of gambling. While assessment with confidence measurement does not openly
126
encourage or condone the activity to a high level it does align itself with this form of
soft entertainment. We often relate to it as risking money to possibly gain money. It is
this risk and reward that underlies most competitive games, including games that pitch
the player against the system. It does not require money to be a part of it, as any game
where the participant risks losing the chance to gain rewards as offered has a gambling
aspect. Risk is directly proportional to uncertainty, and risk increases as the uncertainty
increases. Adams and Rollings (2007) observed that players have varying attitudes
towards risk-taking, as some take the aggressive stand, the inherently risky approach of
overstating their confidence to maximise gain while others prefer the more defensive
approach, understating their confidence to minimise the risk of losing marks. They
further mandate that game design risk must always be accompanied by rewards, the
greater the risk the greater the reward, otherwise there is no incentive to take the risk.
6.2.5 Learning the Game Play
Learning in this context is not necessarily the learning and understanding of the
educational material but the learning of the game and how it should be used to its
greatest benefit. Game players do learn ways of maximising the benefits, understanding
and predicting the sequence of events to rise to the highest levels. Learning how to play
the game to maximise the outcomes relies on two contributing factors, enjoyment and
mastery, and it has been observed that participants like learning when at least one of the
two is met (Adams and Rollings, 2007).
6.2.6 The Influence of Skill, Stress and Absolute Difficulty on Games
To understand the absolute difficulty of a game, one must consider the skill and stress
factors. The intrinsic skill is the level of skill of the participant. Stress is the emotional
state of the participant during the experience often brought upon by external factors,
such as the fear of failure. Some challenges have an intrinsic stress level incorporated,
such as reactionary games, while others require some constraints (e.g. of time) to
achieve a significant level of stress. In addition a consciousness of the consequences of
the outcomes, such as the formal grading of the performance also has an inherent stress
127
level often accentuated by the application of a time constraint. The absolute difficulty is
the combination of the intrinsic skill and the stressfulness experienced during the
activities. When designing a game consideration must be given to the absolute
difficulty, getting the balance between the stress level and the intrinsic skill required. If
one dominates the other adjustment might be required to bring the combination back to
an acceptable level of absolute difficulty.
6.3 MCQCM Adherence to Game Play Topology
The desire to emulate games in education greatly influences the design and
functionality of the CAA applications and has done so for the MCQCM. The MCQCM
relies heavily on the metaphor of placing a bet to optimise the gain, referred to in
Section 4.3, where there is detailed discussion about the adopted scoring method, based
on the probability theory of gaming. This underlying supporting theory is constructed
around the probability and associated wagering. The user interacts with the MCQCM in
a game environment where they are challenged to achieve the best possible score while
risking a loss for incorrect answers. The MCQCM becomes implicitly motivational by
the close proximity to a challenging game.
It is appropriate that the fundamental game structure and hierarchy, as discussed above,
be referenced here to understand the construction and application of the MCQCM in
this domain, recognising its adherence to some of the aforementioned game play
topology.
6.3.1 MCQCM Adherence to Playability Guidelines and Heuristics
Bradshaw (2007, p. 128) produced a series of playability principles, one of particular
importance to this research is “the need for a visual or tactile response to their actions
to be able to compare how well they are doing in relation to their desired outcome
….how their actions have progressed them in the attainment of their goal”. The
MCQCM was designed with this important guideline, as it permits the student to
interact with the system freely when sliding the bar in a game type environment. This
system of direct manipulation instantaneously, numerically displays the possible gain
128
or loss that would result in the direct consequences of their actions. This reinforces
student’s actions towards progressing towards their final goal, to achieve the best score
possible with their state of knowledge, while also acquiring contributing marks for
partial knowledge and limiting the penalties for answers where they have no real
knowledge.
Desurvire, Caplan and Toth’s (2004) set of game play heuristics are of interest to the
MCQCM application influencing the design of the MCQCM for the following reasons.
As part of Desurvire, Caplan and Toth’s (2004) work to evaluate game playing, they
identified the need for the participant to be quickly involved through the use of tutorials
and lower level experiences (Desurvire, Caplan, & Toth, 2004). The MCQCM does
this effectively by offering a series of training activities based on general knowledge
questions, not subject content specific, in which the students use the system in an open
forum as an entertainment activity. The resulting scores are then displayed and students
are encouraged to participate in further non-threatening games based on general
knowledge.
Furthermore, Desurvire, Caplan and Toth (2004) recommends that the participant
should not experience continual penalising for the same failure, giving them the
opportunity to eventually attain a positive outcome. In fact, she purports that the first
experience should be easy and return immediate positive feedback, which is a major
requirement of questions setting for the MCQCM. It is important that the game applies
pressure while not frustrating the player, with a variation of the level of difficulty to
further engage them. She stresses the need for the player to always be able to identify
their score and the system provide a consistent mapped and learnable response. The
controls, in this case the sliding bar, should be intuitive and mapped in a natural,
obvious manner. The MCQCM’s interactive device, the sliding bar, permits the
participant to engage with it at a comfortably proportioned mapping, as recommended.
6.3.2 MCQCM’s Hierarchy of Challenges and Actions
As discussed games require a hierarchy of challenges for the student to progress
through (Adams and Rollings, 2007), ranging from the simple to more extreme. The
MCQCM conforms to this in the instance when the final submission occurs after the
129
student has addressed all of the questions and in turn all of the optional answers for
each question. The lowest level of challenges, the consideration of the options for each
question,
As previously stated actions are not restricted to the challenges, that is all actions are
not a direct result of a challenge. This can be demonstrated by the observation of the
sliding bar of the MCQCM being slid freely and endlessly at the whim of the student
without any ramifications or repercussions until the final test submission. Similarly, the
student can freely navigate forward and back, jumping from question to question if they
please, changing the levels of confidence as many times as they deem necessary. This
action does not necessarily have any bearing on the final result. It is the process of
finishing and submission that the rules dictate is the final action before evaluation.
6.3.3 MCQCM Learnability
The recommended learnability of a game is of a high priority in the design of the
MCQCM. The process of direct manipulation as described above allows for the
learnability of the system by extended use and practice. The resulting change to the
displayed score if correct reinforces the actions of the student as they develop the skills
to interact with the MCQCM.
6.3.4 Fairness of the MCQCM
The notion of fairness must be prominent in the designing of an interactive assessment,
often reflected in the choice of scoring. The balanced scoring adopted by the MCQCM
generally satisfies this requirement with the fundamental principle of proportional
rewards and penalties for the level of knowledge. This is one of the criticisms leveled at
the scoring mechanisms of Paul (1994) and Gardner-Medwin (2006) as their strategic
approach is dominated by the promotion of choosing high levels of confidence if
certain and low if not, simply sticking to a predefined recipe. Balanced scoring of
equally positive and negative grades promotes moving across the scoring zone where
the loss or gain is proportional to the registration of confidence as discussed in Chapter
4 concentrating on the validity of scoring with penalty.
130
During focus groups held at the Computer Assisted Assessment Conference criticism
was leveled at disproportional penalising marking mechanisms that provide non penalty
areas for low confidence, as Gardner-Medwin’s (2006) and Paul’s (1994) scoring
strategies use, as the participants are well trained in using the tool to maximise gain and
minimise loss. Systems as such are accused of effectively training the student in
methods of maximising grades rather than honestly appraising their level of knowledge.
6.3.5 MCQCM Stress Levels and Overall Level of Difficulty
The control of the MCQCM operational stress level was an important part in the
designing and implementation of the MCQCM. Early iterations of the MCQCM
incorporated a time clock in the right hand top corner to increase the pressure on the
student as part of the assessment strategy. It also kept the test time in check to ensure
that the students completed the exercise within the given time. This was abandoned for
two reasons. It was found that the additional stress was unacceptable as the students
found the change in the level of interactivity, being immersed in the lower level
challenges, too much alone without the extra stress caused by the timer. The absolute
difficulty was out of balance and needed to be reset, relieving the stress level by
eliminating the time constraint. Secondly, the primary objective for the use of the
MCQCM is as a formative assessment tool available to the user at a time convenient to
them under their rules of engagement. The imposed time restriction in this case was in
complete contradiction to the primary objective.
While the designers of the MCQCM can minimise the stress levels from external
sources they cannot eliminate them all as the individuals will bring with them their own
operational stress levels.
6.3.6 Summary of MCQCM Adherence to Game Play Topology
The above discussion formalises aspects of the MCQCM in the game theory area as the
fundamental design strategies incorporated are founded on the game topology. The
MCQCM conforms to many of the requirements as it has game elements designed to
entertain whilst promoting the learning and self-assessment.
131
The MCQCM is defined by game theory as it offers both challenges and actions. The
goal is defined by optimising the scoring benefit and recognition of knowledge, partial
knowledge and incorrect knowledge. The rules of the MCQCM are immutable and
symmetric offering a fair game for all players. The MCQCM offers the risk of
displaying lack of knowledge and achieving negative grades, which offers the longer-
term reward of enabling a directed study path to improve knowledge. The ability to
manipulate the tools in the MCQCM enable affordance of the outcome in assigned
marks without penalty, thus encourages learnability before commitment. The balance of
stress and difficulty has been trialed and moderated to ensure the student is focused on
the question and not the timing of the system.
These factors all demonstrate the synergies of the MCQCM to games theory and how
the MCQCM has met with the best practices of games design to encourage students to
engage with the system to demonstrate their level of knowledge, be it complete, partial
or incorrect.
6.4 Addressing Design and Usability Issues of MCQCM
Sharp, Rogers and Preece (2007) define interaction design as the designing of
interactive systems to support communication and interaction of people in their
everyday and working lives. They emphasise the need for systems to be developed
from the user’s viewpoint, stating that many developed systems that work from an
engineering perspective do so at the expense of how the system will be used in a real
world. The MCQCM is no exception to this area as the role it plays in the student’s
world could be critical.
As outlined in previous discussion in Section 2.4.1 addressing the problem solving and
research frameworks, the Web-based MCQCM system was designed and developed
adhering to the HCI guidelines for interactive systems (Sharp, Rogers & Preece, 2007).
A major contributing factor to successful interactive system design is the mindfulness
of the cognitive load that the system imposes upon the user. Shneiderman and Plaisant
(2005) consider understanding the cognitive and perceptual abilities of the users as a
vital foundation, underpinning interactive system design (B. Shneiderman & Plaisant,
132
2005). Consequently any identified components of functionality of an interactive
system that unjustifiably increases the cognitive load should be addressed immediately
to alleviate undue stress or confusion, clarifying the functionality. In this case the
identified major flaw of the MCQCM was the reliance of the confidence-sliding bar as
the only mechanism to identify if an answer was correct or incorrect as well as the
student’s confidence in that answer. This poorly designed functionality of the MCQCM
in some cases produced a cognitive overload situation with the users, resulting in
confusion and inferior achievement.
Improvement in the design and the consequential usability of an interactive system is
dependent on good HCI practices. In most cases it is reliant on the designer going well
beyond the vague notion of “user friendly”, by having a more complete and thorough
understanding of the broader community (B. Shneiderman & Plaisant, 2005). To
achieve this Shneiderman and Plaisant (2005) identify goals for good design:
Standardisation, Consistency and Portability of data. Their reference to
Standardisation, the need for common user interface components across various
platforms, and Portability: the ability to convert data to be shared across the various
display options had a primary influence in the design of the MCQCM. The Consistency
of the action sequence, layout, terms, unit, colors and so on must be considered for the
duration of the design process. It is this area of consistency that extensive work
occurred in the redesigning of the MCQCM, as the non-cluttered layout of the
interactive screens, consistent positioning of the icons and the clarity of the feedback
displays are critical to the usability of the MCQCM.
A sound navigational aspect is at the heart of good usability. Schneiderman et al.
(2005) identifies the need to have knowledge of the overview with the ability to clearly
pursue details as required. The interaction of goal seeking behaviour can be
summarised with the following four elements of navigation:
• Knowing where you are.
• Knowing what you can do.
• Knowing where you are going - or what will happen.
• Knowing where you have been – or what you have done.
133
Awareness of these navigational elements (Dix, Finlay, Abowd, & Beale, 2004) will
directly assist in the designing of interactive systems that leaves the user in no doubt of
the present, previous and proposed positioning. The progress status of a student doing a
computer based assessment exercise is of the utmost importance.
Error prevention, error messages and assistance in handing errors play an important role
in good usability design. Users are reliant on clear direction when faced with error
messages as failure to do so can lead to fatal errors in operation. Error messages often
have a tendency to overwhelm the participant in a harsh, sometimes threatening manner
(Sharp, Rogers & Preece, 2007) that can have an adverse affect on the users experience,
hence they have to be well thought out to minimise the negatives and maximise their
effectiveness. Sharp, Rogers and Preece (2007) further claim that a poorly designed
interface can often leave the user feeling inadequate, insulting them and having them
feel stupid. The permitting of the user to rectify an error is a critical component of a
well-designed interactive system.
Equally of importance is the method by which graphics and scripts are displayed in
interactive systems. The issue of limited screen space places a serious constraint on
visual communication (Leung, 1995), often resulting in an interference with what was
meant to be and what actually is conveyed. This limitation often leads to a requirement
to navigate around the presented information space or the simultaneous viewing of
information in the same workspace. The main concern is locating the desired
information in the workspace without getting lost. To achieve this there is a need to
have a global as well as a local view of the information space for task switching.
Bannon, Cypher, Greenspan, and Monty (1983) suggests that there are a number of
areas to consider in interface design with regards to task switching; one is the reduction
of the user’s cognitive load (Bannon, Cypher, Greenspan, & Monty, 1983). Leung
(1995) implemented an innovative solution to address the issue of visual display
constraints by adopting a bi-focal approach, in which the user can view targeted
information without losing sight of the broader information space.
Interactive systems are often designed by leveraging off existing artifacts of the real
world with which users have previous experience and are familiar to their operation. An
object offering high affordance permits the user to quickly assimilate with the new
134
environment. The MCQCM does so with game-play topology, as the experience is
closely related to the gambling phenomena. Interactive games are heavily reliant on
presentation elements making contributing to the playfulness of the experience.
Accordingly they must be designed to conform to the game play guidelines as
stipulated by Adams and Rollings (2006). Game Theory offers high affordance for the
MCQCM as previously discussed.
6.4.1 Addressing the Cognitive Load of the MCQCM
The first identified MCQCM operational area of concern that needs to be addressed is
the cognitive process of decision-making, in particular questioning if the nomination of
an answer as being either “True” or “False” is the dominant factor in the participant’s
mind, or if the expression of the level of confidence is the dictating action.
The area of concern appears with the following question. Is the choice of the option
being either ‘true’ or ‘false’ dominant in the participant’s mind? Having the confidence
sliding bar as the primary source of identifying if the option is ‘true’ or ‘false’ could be
confusing to the student. Normally the student would prefer firstly to identify if the
option is ‘True’ or ‘False’ before registering his/her level of confidence. As the
confidence-sliding bar is used to perform two specific functions of selecting the answer
and registering the level of confidence associated with it, there may be
misinterpretations in using the sliding bar. For example, in stating that you are 80 per
cent sure the option is ‘False’, is this the same as stating that you are 20 per cent sure
the option is ‘True’? (See Figure 6-1) This question needed to be investigated in order
to develop a tool of maximum benefit to the students.
135
(a) 80% sure the option is ‘false’
(b) 20% sure the option is ‘true’
Figure 6-1: Slide Rule to Register Confidence.
6.4.1.1 The Problem of using Confidence Measurement to identify correct answers
In order to address this problem we must refer to one of the most influential concepts
formulated in modern psychology, being Albert Bandura’s (1983) notion of Self-
efficacy Expectations. Bandura’s (1983) work focuses on the belief in our capabilities
to successfully perform a given task or behaviour, which in turn influences behavioural
choices, performance and persistence. A key component of Bandura’s (1983)
formulated self-efficacy concept is that self-efficacy can be increased through
performance accomplishments. Bandura (1983) considers this to be the major
influences on behaviours and behavioural change, stating that low self-efficacy
expectations within a domain can lead to avoidance, while an increase in self-efficacy
will result in an increase in the frequency of the approach (Bandura, 1983). He also
postulates that intervention can increase the self-efficacy expectations and specifies
four sources by which self-efficacy expectations can be modified. Two of these are of
particular interest in this area of study. The first is that experiences of performing
successfully will be beneficial. The second is the awareness of physiological arousal,
such as anxiety with the behaviour or task, is seen as a co-effect of self-efficacy
expectations, where an increase in self-efficacy should result in a decrease of anxiety.
Importantly to this study the reverse also applies, a decrease in self-efficacy leads to an
increase in anxiety.
136
Betz and Hackett (1981) extensively used the concept of self-efficacy expectations by
applying it to career psychology and counseling. In their study they implemented the
questionnaire format that retains Bandura’s (1983) original notion of the level
(“yes/no”) with the strength (confidence) of self-efficacy (Betz & Hackett, 1981). The
technique they developed required the participant to commit to an answer first. Once
committed the individual is required to clearly state their degree of confidence in that
answer. Fullarton (1993) also used the same method of testing when investigating
gender effects on confidence in mathematics. Her technique was to ask the student to
identify the correct answer and then to register their level of confidence in the choice
(Fullarton, 1993). This is the very crux of the situation under investigation. It was
considered that by adopting the above technique, asking the student to firstly commit to
an answer before stating their level of confidence, eliminates the possible confusion in
the process. As Bandura (1983) identified, the idea of stating a “level” (answer) to be
followed by a “strength” (confidence) of self-efficacy gives a clear, unconfused picture
of the student’s response.
6.4.1.2 The Design Solution to the Problem of Using Confidence Measurement to
Identify Correct Answers
The problem outlined in Section 6.4, which required the student registering both choice
and confidence in the one activity, demanded immediate action.
The resulting modified MCQCM design is still based on the traditional MCQ format.
The initial questions are presented with the stem and the options displayed with a True
or False button at the end of each option only. The student is still required to consider
all of the options as there could be one or more correct answers, which requires the
student to identify not only what they consider to be correct options but also what
options they deem as incorrect. The new design of the MCQCM question screen is
shown in Figure 6-2.
137
Figure 6-2: First Fundamental Version of the Web-based MCQCM.
The student is required to commit to an answer, or as Bandura (1983) refers to a
“level”. In this case it is either True or False.
Figure 6-3: The Appearance of the Confidence Sliding Bar.
138
The sliding bar for registering the degree of confidence only appears for each option
after the student has committed to either True or False (See Figure 6-3). This controlled
environment ensures that the student is led through the testing procedure with the
minimum of confusion, decreasing the cognitive load.
Once the major concern and cause of confusion of having the one action for facilitating
two requirements of choosing the correct answer (Level) and then the registering of
their confidence (Strength) was addressed, it was then necessary to evaluate the
usability of the system.
6.4.2 HCI Evaluation of the MCQCM
HCI uses various methods of usability evaluation, two of these being User Testing;
where users are directly involved in the testing (Sharp, Rogers & Preece, 2007), and
Evaluation by Inspection: usually by experts in the field evaluating the system against a
list of industry standard heuristics (Te'eni, Carey, & Zhang, 2007). Both methods have
their own strengths and weaknesses. It is difficult to involve users in real life
summative assessment situations in the classrooms due to the complexity of the
environment, as the nature of usability testing often requires the participant to actively
communicate during the process via verbal protocol (speaking out loud), placing extra
stress upon them and in direct conflict with the rules of individual assessment. Any
additional stress could affect the concentration of the participant, influencing both their
final result and their perception of the experience. For this reasons it is often a
preference to use the Inspection method for the evaluation of a computer aided
assessment (CAA) system, where no interference with the student during testing
occurs.
6.4.3 Heuristics Testing for Computer Aided Assessment (CAA)
Sim, Read and Holifield (2008) in their work have produced a series of heuristics
specifically designed to assist in the usability evaluation of (CAA) tools (Sim, Read, &
Holifield, 2008). The works of Nielsen (1994a, 1994b) in developing a general set of
heuristics has been heralded as a major contributor to the HCI field and are extensively
139
employed by HCI practitioners (Nielsen, 1994a, 1994b; Nielsen & Molich, 1990). As
outlined by Nielsen (1994a, 1994b) an heuristics evaluation (HE) consists of a number
of experts (3 to 6) evaluating an interface against a list of heuristics, producing a report
with severity ratings given. These severity ratings are designed to initially identify if a
problem might exist and then gauge its potential impact by incorporating severity
ratings. The five severity ratings devised by Nielsen (1994a, 1994b) are dependent on
the frequency with which the problem occurs, the impact of the problem if it occurs and
finally the persistence of the problem are as follows:
0 = I don't agree that this is a usability problem at all
1 = Cosmetic problem only: need not be fixed unless extra time is available on project
2 = Minor usability problem: fixing this should be given low priority
3 = Major usability problem: important to fix, so should be given high priority
4 = Usability catastrophe: imperative to fix this before product can be released
In recent years development of domain specific heuristics is occurring, as demonstrated
by Paddison and Englefield (2004) in the formulation of accessibility heuristics and
Desurvire, Caplan and Toth (2004) heuristics for games playing (Paddison &
Englefield, 2004).
Likewise, earlier work by Sim, Read and Holifield (2006) highlight the concern that
Neilson’s (1994a, 1994b) severity ratings are too generic for CAA applications not
being able to distinguish what constitutes a Major Usability Problem and a Usability
Catastrophe (Sim, Read, & Holifield, 2006). They identified a need for CAA domain
specific severity ratings that deal with unacceptable consequences when the user
interacts with a CAA application. Sim, Read and Holifield (2008) suggest the following
variation of severity ratings suitable for the CAA application evaluation.
0= I don’t think that this is a usability problem
1= Possible effect, could cause some users to perform less well than would have
performed otherwise
2= Minor effect, would probably affect one or more questions in the test for most users
3= Major effect, would probably affect many questions in the test for most users
4= Catastrophe: all work lost
140
Sim, Read and Holifield (2008) consider the role of the user as a major consideration in
the designing of an interactive assessment system, stating that ultimately the students
have the most to lose. They emphasise that the user experience, level of comfort and
feeling of control when engaging with an interactive assessment tool can greatly
influence their performance. They also recognise the need to understand what is of
importance to the stakeholders. Further, they believe the traditional usability goals of
efficiency, effectiveness and satisfaction are not extensive enough as the goals
pertaining to computer based assessment and are different than the casual user of a
generic interactive system.
Sim, Read and Holifield (2008) also identify the legal obligation an educational
institution has to their students to supply assessment regimes that are deemed to be fair
to all and without discrimination or bias. Assessment tools of poor usability design
could place the institution in a vulnerable position if needed to defend an assessment
appeal as a result of a poor test score being attributed to substandard usability of an
interactive assessment tool. Such appeals could be attributed to loss of test time through
ineffective navigation, the inability to deselect an answer after further reflection and
other negative experiences. It is for these reasons that Sim, Read and Cockton (2006)
embarked on a series of experiments culminating in a corpus of usability problems
directly associated with the CAA environment and consequently to develop a set of
heuristics specifically designed to evaluate the usability of a CAA application (Sim,
Horton, & Strong, 2004; Sim et al., 2009; Sim et al., 2006, 2008). Their constructed
heuristics are listed in Table 6-1.
141
Heuristic
Number Description
1 Use clear language and grammar within questions and ensure the score is
clearly displayed
2 Ensure progress through the test is visible and understandable
3 Answering questions should be intuitive
4 Easy reversal of actions
5 Inform users of any unanswered questions before finishing
6 Ensure appropriate interface design characteristics
7 Visual layout - adequate spacing and visibility of questions
8 Ensure appropriate feedback
9 Moving between questions and terminating the exam should be intuitive
10 Minimise time delays
11 Minimise external influences to the user
Table 6-1: List of Sim et al. (2006) heuristics for CAA.
Although the heuristics for application to the CAA environment listed above were not
available as they were in development, the ‘work in progress’ versions of them were
used as a means of expert evaluation during the MCQCM design process.
6.5 MCQCM Heuristic Evaluation Method
An early version of the CAA heuristics as developed by Sim, Read and Cockton (2006)
were evaluated against the MCQCM by two expert evaluators. The evaluation was
undertaken in a usability laboratory to minimise distractions and isolate the variables.
The evaluators registered their concerns with the MCQCM together with severity
ratings that enabled the following process of redesigning to the CAA heuristic
guidelines to occur.
142
6.5.1 MCQCM Redesign Resulting from Usability Heurisitics
The main interactive components of the MCQCM application will be presented here
with supporting discussion addressing some of the identified concerns with the
MCQCM that registered higher levels of severity. Not all of the functionality of the
MCQCM can be displayed here, as the resulting final design of the MCQCM for
implementation is quite extensive. For this reason the full operational extent of the
MCQCM is demonstrated in the appendices, but it is necessary to display some of the
screen displays here highlighting the main features of the MCQCM, in particular the
screens primarily designed to interact with the student.
To facilitate this section of the research a series of screen displays with explanations
about their design in reference to the Sim, Read and Holifield (2008) HCI Heuristics
for CAA applications are presented in the following section.
6.5.2 Grid Layout of Question Screen
Figure 6-4 is a typical example of the question screen that the student is faced with for
the duration of the test. In this case the student demonstrates that even though they are
quite confident that the answer is option 2, TCP/IP, they think that it might also be the
first option HTTP, although in this choice they are not as confident.
Figure 6-4: Grid Layout of MCQCM.
143
It was observed by the evaluators that there was difficulty in defining the working areas
of the screen enabling clear delineation of questions, responses and navigation. This led
to the adoption of a grid formation for the display.
In Figure 6-4 it can be observed that the screen offers a clear balanced layout using
distinctive areas in a grid formation in keeping with Nielsen’s (1994) layout guidelines
and Sim, Read and Holifield (2008) heuristic 7, an interpretation of Nielsen’s (1994)
heuristics of Visual Layout; Adequate spacing and visibility, heuristic 1, Use clear
language and grammar within questions and ensure the score is clearly displayed
heuristic 3, Answering questions should be intuitive and heuristic 11, Minimise external
influences to the user. Furthermore, it can be seen in Figure 6-5 that the main question
page is divided into its functional areas as identified by:
1 Header with question number being attempted and the total number of questions,
2 Stem of the question and the button to register the completion of the test
3 Answer options
4 Bandura’s (1983) “Level” of either True or False through tick boxes
5 Bandura’s (1983) “Strength of commitment” through slide bars
6 Numerical level of confidence to be submitted
7 Navigation feature including the progress bar, list of all questions and the one
highlighted being attempted, acceptance button to register attempt on question
Figure 6-5: Grid Layout of the Functional Areas.
144
6.5.3 Visibility of Student Progress During the MCQCM Test
Sim, Read and Holifield (2008) heuristic 2 states; Ensuring progress through the test is
visible. This issue was noted during the pilot studies as the students asked to know at
any time what question they were presently attempting and importantly the total
number of questions in the test.
Figure 6-6 shows the supportive elements to address these navigational concerns are
demonstrated.
Figure 6-6: Question Display Showing 3 Navigational Supports.
It can be seen in Figure 6-6 that the redesigned MCQCM achieves good visibility of the
student’s test progress status and navigation by supplying a visual display of their
progress with a progress bar that increases in size as the student completes questions, as
well as a clear statement at the top of the display showing the question number they are
attempting and the total number of questions in the test. In addition it can be seen that
above the progress bar there is the list of the total number of questions with the
completed processed questions being shown in red while all other, as yet not attempted
questions, are in blue. This hyperlinked navigational component also permits the
students to move back and forward to the question as they please. All previous versions
of the MCQCM restricted the student to a linear approach, where they could only
145
attempt the question once and move to the next question in the row. The additional
feature of being able to move to any question at the discretion of the student, for re-
answering or reassessing, is in keeping with Sim, Read and Holifield. (2008) heuristic
4; Easy reversal of actions, heuristic 9; Moving between questions is intuitive and
heuristic 10, Minimise time delays.
6.5.4 Minimisation of Errors and Error Prevention
A critical component of a well designed Web-based interactive system is the need to
support the user by minimising the number of errors (error prevention), as stated by
Nielsen (1994) and later reiterated by Sim, Read and Holifield (2008) heuristic 5:
Informing students of unanswered questions; heuristic 4: Easy reversal of actions and
heuristic 10: Minimise time delays.
The pilot tests and the heuristic evaluation of the MCQCM highlighted the need for the
students to be able to move freely from question to question, and most importantly have
the option of changing their response before test submission. In addition, during the
pilot studies the students requested the flexibility to submit each individual question
response formally using the “accept” button, however, holding the right to change that
submitted answer before the final test submission and consequential assessment. These
functionalities are part of the operation of the MCQCM and are demonstrated in Figure
6-7.
(A): (B)
Figure 6-7: Support for User to Minimise Errors.
146
Area A in Figure 6-7, demonstrates the ability for the student to “accept” a question as
work in progress, while still having permission to revisit if before the final submission.
Hence, at any time a student can have a number of questions completed ready for final
submission or held as work in progress. However, this built in flexibility can often
leave the student confused about which questions they have answered and which have
been overlooked. To address this an error prevention dialogue box has been used (area
B Figure 6-7), where the student is informed of any questions that they have not
attempted before final test submission. At this time they can either return to the quiz
environment to complete any missed questions or proceed to the grading and solutions
display.
When all questions are completed to the student’s satisfaction they confirm their test
completion by pressing the “OK” button as demonstrated in Figure 6-8.
Figure 6-8: Final Dialogue Box to Support the User in Error
Prevention.
6.5.5 Clear and Informative Feedback
The expert evaluation alerted a concern with Sim, Read and Holifield (2008) heuristic
8; Ensure appropriate feedback, as being of utmost importance and in keeping with the
games design element, heuristic 6, Ensure appropriate interface design characteristics.
147
The consequential solution, shown in Figure 6-9, answers the concern where the
student is informed of the following;
• The correct answer for each option
• Their answer with their registered confidence for each option
• The consequential score calculated for each option in the question
• Their overall score for that question
(A)
(B)
Figure 6-9: Feedback Screens: (A) Display for all Questions with Hyperlink to (B)
Display of Individual Questions.
Figure 6-9 (A) is the first screen that the student sees which summarises the results for
the complete test, again using colours that offer high affordance, green to signify
questions where the student has demonstrated good knowledge (score between 20 to
40), blue for questions that need some attention (score between 0 and 19) and red for
questions where the student has shown inappropriate levels of confidence for incorrect
answers (Score from -1 to -40). To assist the learner they can hyperlink to the display
for each individual question as shown in Figure 6-9 (B).
The feedback screen are shown in Figure 6-9 (B) uses familiar icons (ticks and stars)
and again offers high affordance to increase the clarity of the status of the students
knowledge, red for incorrect, green for correct and blue as the overall grade expressing
a comfortable, but not excellent level of achievement. Further examples of the feedback
screen in Appendix C.
148
6.5.6 Summary of the Redesigning of the MCQCM Adhering to HCI Guidelines
The refinement of MCQCM tool adhering closely to the HCI and CAA design
guidelines definitely improved its usability, functionality and effectiveness, by
eliminating any confusing elements.
Controlling the method of the participant’s responses by forcing them to first register
their choice of true or false and only then permitting them to declare their degree of
confidence has ensured that the operational thought process is of a minimal cognitive
load. This permits the student to concentrate on the tasks at hand, and not be
preoccupied with the indecisions and hesitations that could prohibit their interaction
with the system.
As can be seen from the discussion contained here the final operational version of the
Web-based MCQCM has been refined and constructed adhering to the fundamental
design guidelines outlined above, incorporating consistent clear screen layout, sound
navigation, error prevention and error handling to minmise diversions and optimise the
outcomes.
6.5.7 Heuristics for MCQ with Confidence Measurement
The previous discussion demonstrated the adherence of the MCQCM to Sim, Read and
Holifield (2008) rework of Nielsen’s (1994) heuristics for interactive systems. Further
to the guidelines for CAA as derived from Nielsen’s (1994) HCI heuristics for
interactive systems it is proposed here that an interactive MCQ with confidence
measurement system requires refinement of Sim, Read and Holifield (2008) heuristics
given the need to ensure learnability. It is also recognised here that there is a need for
immediacy in response to the user’s activity, visual impediment to screen real estate
and motivation through entertainment to encourage interaction.
Table 6-2 demonstrates Sim, Read and Holifield (2008) customized heuristics for
computer aided assessment systems with a set of guidelines for MCQ assessment with
confidence measurement.
149
Sim et al. Heuristics for
Computer Aided
Assessment.
Guidelines for MCQ’s with
Confidence Measurement
Problems to Overcome
1: Use clear language and grammar within questions and ensure the score is clearly displayed.
Use clear language and grammar within questions and immediately display the registered level of confidence and consequential score.
Learnability
Easy reversal of actions
2: Ensure progress through the test is visible and understandable.
Ensure progress through the test by providing progress bars with total number of questions answered and yet to be answered.
Navigation
Time allocation where required
3: Answering questions should be intuitive.
Answering questions should be intuitive, with possible identification of the number of possible correct answers supplied.
Multiple responses allowed, this is not always the case with other methods of MCQ and must be made clear.
4: Easy reversal of actions. Easy reversal of actions by permitting the student to return to any question for re-answering before final submission.
Learnability
Easy reversal of actions
5: Inform users of any unanswered questions before finishing.
Inform users of unanswered questions before finishing by providing alert messages identifying those not answered.
Error Prevention
6: Ensure appropriate interface design characteristics.
Ensure appropriate interface design characteristics using suitable game playing metaphors with appropriate challenges and fairness
Satisfaction
Motivation to “play” with the interface
…………..table continued overpage
150
………..table continued
Sim et al. Heuristics for
Computer Aided
Assessment.
Guidelines for MCQ’s with
Confidence Measurement
Problems to Overcome
7: Visual layout - adequate spacing and visibility of questions
Visual layout - adequate spacing and visibility of questions using bifocal display techniques for display of information in restricted space.
Restricted Screen real estate available to large or graphical questions
8: Ensure appropriate feedback
Feedback to be graphically pleasing, clearly identifying incorrect choices with registered levels of confidence.
Learnability
Affordance
Easy reversal of actions
9: Moving between questions and terminating the exam should be intuitive
Ability to move to questions in a non-linear manner and clear action for final submission.
Easy reversal of actions
Navigation
Error prevention
10: Minimise time delays
Immediate process of score calculation provided by Web-based solution.
Easy reversal of actions
Affordance, able to see the immediate consequence of actions
11:Minimise external influences to the user
Develop presentation screens that use visual or audio stimuli only if critical to the question.
Reduce cognitive load
Table 6-2: List of Sim et al. (2006) Heuristics with Elaborated Heuristics for MCQ
with Confidence Measurement and Problems Addressed by Revised Heuristics.
6.6 MCQCM’s Method of Handling Graphical Components
Often the content area being tested is reliant on the interpretation of graphics and
scripts, particularly in the IT discipline area where this research was undertaken. It is
151
usual testing practice for the student to be shown a diagram; such as an entity
relationship model demonstrating a particular scenario, or a series of programming
segments; such as Structured Query Language (SQL) scripts. It is from these diagrams
or scripts that a set of questions are asked where the student is required to identify
various components of the diagram, express the relationship between the entities, or
recognise an error or identify the correct script when given the output. The issue with
many of the computer based MCQ testing tools is the way it handles these graphics and
script requirements where screen real estate is of a premium. In the traditional MCQ
Web-based assessment package format it is usual practice for the graphics, or script, to
be revealed at the end of the question for reference. This often produces a question
greater in length than the screen, causing a number of usability issues.
6.6.1 Previous Investigative Work on the Graphics Component of Interactive
Assessment
Previous work (Farrell & Leung, 2004a) investigated a practice involving the use of the
Blackboard computer aided assessment package for a large group of students, which
used the “more than one page” display method for the questions containing graphics
and SQL scripts. The analysis of results of the exercise was influential in the designing
of the MCQCM graphics capability component. A brief explanation of the comparison
of the preference of graphical user interface using Blackboard or paper-based questions
with graphics is contained here followed by a summary of the findings. The results of
this evaluation assisted in further refinement of the MCQCMs handling of graphics
where the available real estate is limited, as they support the MCQCM’s handling of
graphics.
A total of 465 students, consisting of 404 Introductory Database (DB1) and 61 Data
Communication (DC), were surveyed as part of the subject review process to give
comments about their assessment experiences. Both cohorts of students used the
Blackboard MCQ assessment package as a summative assessment activity contributing
to their final grade. The Introductory DB1 subject test relied heavily on SQL script
oriented questions whereas the DC subject did not, having all of the questions in the
traditional MCQ format.
152
The students were asked to give their opinion in comparing Blackboard for MCQs and
the traditional paper-based equivalent.
The vast majority of the DC students, 64 per cent, preferred the use of the Blackboard
MCQ to a paper-based equivalent as they felt that they were able to complete the
exercises without real concerns. This was in complete contrast to the results of the DB1
students, where 74 per cent of them voiced concerns about using an online test to
compare SQL scripts. The application of a non-parametric statistical analysis was
applied to the data using the Chi-squared test for significant difference between the two
groups. The question related to their satisfaction with using Blackboard versus paper-
based for MCQs. This observation proved to be significantly different for the two
cohorts (χ2(7)≈ 41.465 : P< .001).
On further investigation it was observed the main reason for the discontentment was the
requirement of the DB1 test to scroll up and down the screen to observe and compare
the scripts and diagrams when answering the questions. The inability to view both the
questions and graphics on the same screen appeared to interfere severely with the
student’s concentration. Correspondingly, many of the students complained about their
grades claiming that the testing mode was not appropriate for the type of questions
being asked. Farrell and Leung (2004a) concurred with this producing the following
evidence supporting the student’s claims.
Many of the DB1 students complained about eyestrain and high anxiety. They also felt
that their ability to concentrate on the questions was compromised by the continual
scrolling, having a negative effect on the final outcome. In addition, the scrolling made
it extremely difficult to review answered questions before submission, again being
detrimental to the final grade.
Interestingly, it was observed that 90 per cent of both cohorts of students registered that
they appreciate the speed and automation of an online test, even though many of the
introductory DB1 students were unsatisfied with this particular test. They
acknowledged the value of the automated online tests and its contribution to the
learning experience.
Students also felt that an index to the questions should be provided so that access to any
particular question is easily attainable, this is in agreement with previous sections
153
where the MCQCM design is influenced by Sim, Read and Holifield (2008) heuristics
on navigation.
The DB1 students demonstrated a significant difference to the DC students in
preference to paper-based questions for the following issues:
• Need for better maneuverability between questions.
• Need to check all answers before submission.
• Ability to concentrate on a single question at a time.
The issue of the requirement to scroll when comparing SQL scripts in DB1 was by far
the most concerning. This observation is of particular concern to this research as
systems that do not cater for the graphics in an appropriate way have a detrimental
effect on the final outcome and are limited in their application. This highlights the need
for a match between the content being tested and the CAA chosen. In this case it was
apparent that the DC content fitted well into the constraints of the CAA whilst the DB1
test did not.
It was concluded that while CAAs offer great opportunities, it is important that the
content being tested should be well matched with the CAA of choice. This is evident
for the DB1 script comparison exercise, where the students needed to view the
alternative scripts and would have benefited from being able to highlight components
for closer scrutiny. In addition, Farrell and Leung (2002a, 2002b) identified the need
for early exposure to the CAA, perhaps as a formative assessment task, as the potential
for CAA assessments can only be maximised with good planning and implementation
(Farrell & Leung, 2002a, 2002b).
6.6.2 MCQCM’s Graphics Solution
In light of the discussion above it is proposed that inappropriate choice of CAA for
graphic reliant tests can create great concern. With this in mind there was a need for the
MCQCM to manage the graphics and script components in a way that does not interfere
with the progress of the students, to ensure the use of the MCQCM is not limited by its
use of screen real estate.
154
6.6.2.1 The Reliance on the Visual Communication Channel
Leung (1995, p. 158) in his work on the application of bifocal displays considers the
visual channel in computer interactivity a “far more effective means of communication,
as the high bandwidth nature of this channel facilitates speedy information retrieval and
comprehension”. He further acknowledge that visual communication is the main output
channel used, as effective human-computer interaction is reliant on the presentation of
information enabling the eye and brain to work together to comprehend what the
presenter wants them to see. Leung (1995) considers the early development of infant
hand-eye coordination in their play and interaction with their environment has prepared
them well to engage in increasingly complex activities, with the designers of interactive
computer systems exploiting such skills, particularly in games. Shneiderman (1982)
introduced the concept of “Direct Manipulation” of objects and actions of interest in the
visible interface (Hartmann, Abdulla, Mittal, & Klemmer, 2007; B. Shneiderman,
1982) providing rapid reversible incremental actions, replacing the need for complex
command language syntax. McCormick (1988) state that an estimated 50 per cent of
the brain’s neurons are involved with vision, hence the visualisation in computer
interaction is putting that neurological mechanism to work, consequently over-loading
cognition, thereby reducing the capacity for mental processing of other more pertinent
issues such as the question at hand (McCormick, 1988). Marcus (1984) identifies three
“faces” of the computer, the Outerface; final commutated display, Interface; the frames
of command and control for the user to interact with the system, and Innerface; the
frames of command and control for the computer experts to interact. He argues that
computer graphics should be used appropriately in all of these faces (Marcus, 1984).
Leung (1995) expresses the concerns faced when humans interacting with large
amounts of data on a small screen often need to switch tasks to achieve a higher level of
goals and are often limited by the screen’s size, additionally, when the user interacts
with a large information space there are often difficulties locating and comprehending
the data. Leung (1995, p. 125) states that “visual techniques have an important role to
play to overcome the presentation and navigation problems associated with the human
interaction of large information spaces”.
155
6.6.2.2 Bifocal Display Methods for Large Information Spaces
Spence and Apperley (1982) first proposed the bifocal display with Leung (1995)
further refining and implementing it as an effective means of presenting large amounts
of information on the standard screen, as a response to the need for a method of
handling accessible information (Spence & Apperley, 1982). Their bifocal display
technique is the concurrent presentation of localised detail while still preserving the
global context. In application, it permits the entire space to be seen with a portion
shown in full detail, although the surrounding non-detailed areas are “demagnified”.
This is contrast to the non-distortional presentation techniques (Leung, 1995) that relies
on scrolling and paging and the split screen approach. Paging, scrolling and the split
screen are three non-distortion techniques commonly used. Scrolling permits the
detailed viewing of sections of the graphical display while hiding the rest from view.
Paging displays a section in detail in a new window or area again hiding the remaining,
surrounding graphics. Both of these techniques are identified in the work of Farrell and
Leung (2005) to be unacceptable when used in isolation in a CAA application (Farrell
& Leung, 2005).
6.6.2.3 The MCQCM Visual Display Technique
The MCQCM has combined the technique of bifocal display with the split screen, as
well as incorporating paging options.
In light of the investigative work outlined above in Section 6.6.2, handling of the
graphics component of the MCQCM needed special consideration. As a result the
display technique of Spence and Apperley (1982) and the later bifocal method of Leung
(1995) was adopted, with some additional modifications and variation.
When constructing an assessment system with extended application in the Information
Technology field it is necessary to cater for script comparison, model interpretation and
other various questions reliant on graphical presentation. Hence, it was decided that the
MCQCM would incorporate a graphical presentation method that will minimise the
issues of single screen presentation.
156
To achieve this, the MCQCM presents its graphics in a dynamic, unique way. It is
difficult to demonstrate this here, as a static presentation however a series of screen
shots with appropriate explanations will be provided. The MCQCM presents each
question fully on one screen irrespective of the content. For a question with script or
graphics it divides the screen into two with the top half with a more compact version of
the question and the lower second half with the script or graphic for the student’s
consideration as can be seen in Figure 6-10.
Figure 6-10: MCQCM Dual Screen Display.
As can be seen for this example it was required that the student views a diagram
directly related to the question.
The configuration shown in Figure 6-10 permits the student to view the diagram while
still being able to view the question. The text of the question is small but in most cases
legible. The student then has two options to view the diagram in more detail.
The first is by clicking on the “Maximise” button on the top left corner of the graphics
area. As a result of this action the graphics area expands to fill the screen, as shown in
Figure 6-11.
It should be noted that the graphics, in this case a database Network Diagram, has the
question repeated underneath it in text, consequentially the student can view both the
question and the diagram together even though the original question for answering is
not on the screen, as shown in Figure 6-10. Once finished viewing the student can
157
return to the shared MCQCM question and diagram split screen display by clicking on
“Minimise” in the top left hand corner of the screen.
Figure 6-11: MCQCM Diagram as a Full Screen.
Alternatively, the student can choose to increase the viewing area for the diagram by
placing the cursor on the line dividing the two displays and drag it upwards towards the
question area. This decreases the question display area and increases the diagram
display area. The response is immediate as the cursor moves up and down. Hence the
action permits the student to move quickly from question view to diagram view without
any interruption. A series of shots in Figure 6-12 demonstrate this technique from
sliding the bar from a mid way position to a larger graphic display.
Figure 6-12 depicts two screen shots demonstrating the instantaneous sliding movement
of the MCQCM permitting the student to view various size images of either the
question or the diagram immediately. The diagram on the left is a result of sliding the
dividing bar upwards towards the question.
158
Figure 6-12: Demonstration of MCQCM Display of Varied Screen Sizes.
The student would systematically answer the question by either alternating between the
graphics and answer screen or by sequentially sliding the bar up for graphics viewing
and down for registering their answer. This simple, but effective, approach received
high praise from the students during the implementation, as it seemed to eradicate the
issues of scrolling and removing the question from vision, as presented in the previous
study. Students appreciated that it permitted quick navigation at ease without any
interruption during the test. It also added to the effect of placing the control of the
system into the student’s hands, a necessary feature discussed in previous chapters.
6.7 Summary
The evolutionary design of the MCQCM has been presented in this chapter, taking the
MCQCM from a rudimentary prototype to a fully functional Web-based solution for
implementation in the classroom. The heuristics of Sim, Read and Holifield (2008)
have been refined and extended to suit the requirements of MCQ with confidence
assessment interactive design. In doing so it has been refined in accordance with the
HCI guidelines as outlined adhering to the customised computer aided assessment
heuristics of Sim, Read and Holifield (2008). The challenge of dealing with a complex
diagram and scripting has been also addressed by incorporating aspects of Leung’s
(1995) bifocal display options for large workspaces. The chapter extensively discussed
159
game play topology and its influence on the MCQCM and this research and leverages
off Bandura’s (1983) theory of self-efficacy in order to decrease the cognitive load.
The research question addressed in this chapter is
Research Question 3:
What are the design requirements for developing an interactive assessment with
confidence measurement to ensure that instructors and students are able to achieve
maximum benefit from the system?
This is achieved by the application of sound usability heuristic evaluation techniques as
developed by Sim, Read and Holifield (2008) reworking Nielsen’s (1994) heuristics for
interactive systems.
Chapter 7 reports on the field studies where the MCQCM was used for formative
assessment exercises as part of the delivery program, supporting the students in their
self-assessment and reflection.
160
CHAPTER 7 COMPARISON OF THE MCQCM TO A TRADITIONAL CAA PACKAGE FOR FORMATIVE ASSESSMENT
Chapter 5 discussed the results of the two pilot studies developed and initiated to
evaluate the functionality and usability of the MCQCM gauging the student perception
to using it in a formative assessment task and the design issues to be addressed.
Chapter 6 then applied the recommended changes to the design of the MCQCM.
Comparison of the MCQCM as a formative assessment tool to a traditional MCQ
format tool was required at this stage of the research. This chapter initially reports on
a small simulation exercise to ascertain if the redesigned MCQCM broadly represents
the level of knowledge of the individual before extending it to a large cohort of
students. It then reports on the findings of an investigative study in which a
comparative analysis is undertaken from the responses of a cohort of students using
both the Blackboard Multiple-choice Computer Aided Assessment (CAA) package and
the redesigned MCQCM as a tool for revision.
161
7.1 Trial
During post pilot program discussions in Chapter 5 with students and instructors the
question arose about the ability of the system to truly represent the state of knowledge
of the individual who partakes in the exercise. In particular the instructors expressed a
concern that unleashing the redesigned MCQCM on a large group of students as part of
their learning experience might be a bit presumptuous, as it was not field tested,
suggesting that small trials occur for the duration of the MCQCM development.
It was thought that the best method to establish if the MCQCM results were
representative of the students level of knowledge would be to run a simulation, where a
small number of students with already recognised levels of achievement were asked to
use the MCQCM as a formative assessment tool. To accommodate the simulation
exercise, 6 students of various levels of achievement were invited to participate at the
end of the semester, before the exam. Their abilities to date varied across the spectrum.
The students were given access to the system for a period of one week and encouraged
to complete any number of the given tests as many times as they wanted. All of the
results were recorded automatically and analysed at the end of the exercise.
It was pleasing to observe that in most cases the MCQCM results were consistent with
those achieved by individual using other traditional assessment. The question of
whether the spread of the MCQCM grades would be equivalent to that of the final
grades previously achieved seemed to be supported. The high distinction students all
achieved high MCQCM scores (90%+) as the middle range distinction and credit
students secured the equivalent for their results (74% to 63%). (Appendix B)
There was one set of results that required further investigation, as a high achieving
student’s MCQCM results were extremely poor. (Appendix B) This outcome was
completely unexpected and was received with concern as it reflected poorly on the
MCQCM, immediately prompting a series of questions. Was the student confused
using the system? What happened for the student to do so badly? Does the scoring
system not truly reflect the level of knowledge? On further investigation it was revealed
162
that the student’s first attempt delivered the results that were expected of him: it was the
later attempts that were inconsistent with the expected knowledge.
After initial discussion it was decided that the best way to ascertain why this result was
recorded was to contact the student to see what occurred to produce a result in direct
contrast to the student’s proven ability. When approached the student explained the
reason for the discrepancy was that he enjoyed the interaction of the system and
deliberately played with the operation to see what the results would be. He emulated
different levels of knowledge to see how the system would react, enjoying the
opportunity to interact with it and “push it to the boundaries”.
Even though the discrepancy initially rang alarm bells, it ended up being a positive
result, as it reinforced the idea that the MCQCM, when used as a non-threatening,
formative assessment tool, had encouraged inquisitive, exploratory behaviour, engaging
and entertaining them for a period of time.
7.2 Comparison of the MCQCM to a Traditional Computer Based
Formative Assessment Package
On the successful completion of the simulation the MCQCM was deemed appropriate
to be used as a formative assessment tool for a larger group of students. The following
activity was initiated, as outlined below.
7.2.1 Method
A cohort of 74 students was offered both the MCQCM and Blackboard MCQ systems
as part of their revision program during the semester. The two subjects that this report
focuses on are Database 1 (DB1) and Advanced Web Technologies (AWT). There were
41 DB1 and 33 AWT students.
The Blackboard test was the simple Web-based Multiple-choice Question (MCQ)
format of a stem followed by four simple text options. It does not use penalties for
incorrect answers. In contrast the MCQCM used the confidence measurement and
penalties. Both cohorts of students were offered these self-assessment tests online,
163
permitting him to complete them at their convenience either in the labs, at home or any
other location of their choice where they had Internet access.
As part of the subject review at the end of the semester the students were asked to
complete a questionnaire on various aspects of the subject as part of the standard
subject review process. Included was a series of questions that focused specifically on
the student’s perception of Blackboard CAA and the MCQCM revision tests that they
completed. The data were collected and analysed. The analysis produced some
encouraging observations.
7.2.2 Results Analysis for Students
The MCQCM and MCQ results for the formative assessment exercise were recorded
for analysis to ascertain if there was general consistency between the scores. Figure 7-1.
demonstrates the AWT student’s scores clustered by the MCQ scores and Figure 7-2
shows the DB1 scores clustered by the MCQ scores. The MCQ scores are plotted in
ascending order with the student’s respective MCQCM scores.
Figure 7-1: Graph of MCQ and MCQCM Scores for Cohort 1.
Figure 7-2: Graph of MCQ and MCQCM Scores for Cohort 2.
‐50.0
0.0
50.0
100.0
150.0
%
Comparative Student Scores in MCQ Ascending Order for AWT
MCQCM
MCQ
‐50.0
0.0
50.0
100.0
150.0
%
Comparative Student Scores in MCQ Ascending Order for DB1
MCQCM
MCQ
164
The graphs in Figure 7-1 and 7-2 demonstrate that for both groups a student who
achieves a good score for the MCQ achieves a similar score for the MCQCM.
Likewise, generally a student who does not do well with the MCQ score also does not
score well with the MCQCM. It can be seen that when comparing the individual scores
for the MCQ and MCQCM there is close to an even distribution in proportion of higher
to lower MCQCM scores compared to the MCQ offering a general consistency of
scores.
The subject evaluation survey contained 8 questions, 5 specifically designed to gauge
the usefulness and effectiveness of the tests, and 3 to compare the two testing methods.
In addition the participants were also asked to comment on both the positive and
negative aspects of the tool. The age demographics are presented in Table 7-1 showing
the proportion of undergraduates and postgraduates who were older than 25 years of
age.
Demographics. Postgrads PG’s >25 yrs Undergrads UG’s >25 yrs
Students 16% 75% 84% 52%
Table 7-1: Proportion of Postgraduate and Undergraduate Students and
Proportion of Each >25 Years of Age.
It was observed that there were no apparent differences between the responses of the
two cohorts, as well as no detectable difference when comparing the response from the
postgraduates and the undergraduates. This was the same for the two age groups. The
preferences were consistent across all cohorts and subgroups.
Analysis of all student responses.
The first five questions refer specifically to the student’s perception of the MCQCM
tool. The remaining three questions are specifically designed to compare the student’s
perception of Blackboard CAA to the MCQCM.
To assist the reader the responses have been grouped together in Table 7-2 beside each
of the survey questions.
165
Question
No
Value
Some
Value
Extremely
Valuable
Q: 1 How would you rate the MCQCM
testing method as part of your learning
process? 9% 75% 16%
Never Sometimes Regularly Q: 2 How often would you use MCQCM if
available at any time? 5% 50% 45%
None Some Substantially Q: 3 To what level would the MCQCM
influence your direction and path of your
learning? 13% 66% 21%
Unclear Clear Extremely
Clear
Q: 4 When viewing the MCQCM results
display how clear were the scores?
13% 60% 27%
Unclear Clear Extremely
Clear
Q: 5 When looking at the MCQCM display
how clearly could you identify the problem
areas? 19% 52% 29%
Table 7-2: Responses to the Questions of Student’s Perception of the MCQCM.
It can be observed that a significant number of the students considered the MCQCM as
a good self-assessment tool, with 95 per cent acknowledging that they would use it if
available and 87 per cent declaring that it would influence their learning path.
Importantly, 87 per cent consider the feedback display clear to extremely clear and 81
per cent felt that it identified the areas of concern. In addition, some of the students
asked that it be made available on a weekly basis linked in with the lectures. Students
commented on the ability of the system to display complex diagrams beyond the scope
of many traditional MCQs.
Of those students who declared that they would use the MCQCM tool during their
studies, some students made further requested for it to be available for other subjects in
their studies. The more interesting observations occurred when we compare the two
offered methods of self-assessment made available. The responses to the questions
pertaining to the comparison of both systems are summarised in Table 7-3.
166
Questions
Same Better Much Better Q1 How would you rate the MCQCM
feedback to the BB feedback? 17% 63% 20%
BB Neither MCQCM Q2 Which of the two, BB or MCQCM, was
the best in directing you with your revision? 25% 25% 50%
BB Neither MCQCM Q3 Which of the two, BB or MCQCM,
better informed you of your understanding
of the topics? 33% 33% 33%
Table 7-3: Responses of Student’s Perception of the MCQCM vs BB.
It is observed that the greater majority of students (83%) appreciated the MCQCM
feedback over the Blackboard MCQ. The preference to Blackboard, the MCQCM or
neither was equally distributed at 33 per cent for each when asked which one informed
them better of their understanding of the topics. More students (50%) registered greater
preference for the MCQCM over each of Blackboard (25%) and neither (25%) with
regard to the influence of direction of their revision. It was encouraging to observe that
the MCQCM rated well against the long-standing, established standard MCQ format. It
should be noted that the students have had previous exposure to the Blackboard CAA,
which could add to their comfort level when using the new MCQCM. Alternatively, no
familiarity with the MCQCM may increase the novelty factor.
7.2.3 Instructor’s Focus Group for Formative Assessment
To understand the viewpoint of the instructors in relation to the use of the MCQCM a
focus group was formed where the three instructors for the two student cohorts met
with the developers. As in previous studies the instructors were shown the MCQCM
displays of the student’s grades (cumulative and individual) along with displays of the
question screens and the student feedback. The instructors were asked to give their
feedback and opinion for each tool as well as encouraged to contribute additional
opinions and perceptions of the MCQCM.
The following observations and recommendations were recorded.
167
The instructors all considered the Blackboard and MCQCM formative assessment
results to be closely aligned, commenting on the more fine-grained set of grades for the
MCQCM. The instructors registered concern that the increased distribution of results
might not necessarily demonstrate a more discerning set of grades but in fact only
really represent the student’s propensity towards stating high or low confidence. They
elaborated on this by stating that they observe some of their students to often overstate
their levels of confidence while equally some understate. The instructors extended this
point by highlighting that gender and cultural backgrounds may influence the student’s
willingness to register high levels of confidence.
When shown the MCQCM graphs of the student population scores that identified the
areas of overall poor performance the instructors immediately acknowledged the areas
of apparent misunderstanding and confirmed that they would be readdressing those
areas during the revision session, furthermore they stated they would adjust the
curriculum for the benefit of the next enrolled cohort of students.
With regards to the testing interface the instructors perceived the analogy to betting on
a answer for proportional rewards or losses would be engaging and challenging to the
students. They appreciated the visual metaphor of a game with the direct manipulation
interactivity, identifying the bifocal/split screen interaction for the graphics in particular
as an entertaining element. However, they did register concerns that some of the
students could find the betting metaphor inappropriate based on their religious or
cultural beliefs and that an alternative might be necessary. They all thought that the
ability to navigate through the questions in a non-linear manner was advantageous to
the student, as well as the progress bar highlighting the questions yet to be answered.
When the instructors were shown the MCQCM student feedback presentation screens
they were pleased with the use of the familiar games icons, such as the green stars for
correct answers and the red ones for the incorrect answers. They also appreciated the
initial feedback presentation screen where the overall results for all of the questions are
displayed demonstrating the questions where the student did very well, adequately or
require immediate attention. In addition the instructors thought that the hyperlinks from
these overall test results display to the individual question results display was well-
designed offering personal feedback to the students.
168
Finally the instructors stated that the availability of both the MCQCM and the
Blackboard MCQ via the Internet is greatly beneficial to the conscientious student
wanting to improve their understanding of the subject material and that the opportunity
for self-assessment is taken advantage of by most of their students at least once during
the semester. However, they all stated that the demand for constructing MCQCM
questions is greater than required for MCQ, as more consideration has to be given to all
of the answer options as the strength of the MCQCM is reliant on providing multiple
correct options with no obvious distracters.
7.3 Concluding Observations of Comparison of MCQCM to
Traditional Computer Assessment
In general students appeared to appreciate the MCQCM tool as a valuable self-
assessment exercise, in particular the confirmation from most of the students that the
feedback was considered better than the traditional MCQ format and that it rated
equally as well as the MCQ in directing the students in their learning and informing
them of their understanding of the content. Given the time that students have been
exposed to Blackboards MCQ’s it is quite feasible that long-term exposure to the
MCQCM could result in an ongoing acceptance of it as a tool for revision.
Some students requested that we use the MCQCM for summative assessment, as they
considered it could be beneficial to be exposed to it during the semester as a formative
assessment tool in preparation for it to be part of the final exam.
The pleasing results of the first investigative study of using the MCQCM as a self-
assessment tool during the semester encouraged further studies in the field. The next
section of this research, Chapter 8, reports on a series of applications of the MCQCM
as a summative assessment tool, contributing to the final overall grade for the subjects.
7.4 Summary
This chapter has reported on the findings of implementing the MCQCM as a formative
assessment tool during the subject delivery as a means of student self assessment and
169
instructor reflection. It has achieved this by allowing the students to use the MCQCM
with other CAA strategies for comparison of scores and student’s and instructor’s
perceptions of the MCQCM. In doing so this chapter answers the research question
formulated in Chapter 2 pertaining to the application of assessment with confidence
measurement for formative assessment, being:
Research Question 1.
Does Assessment with Confidence Measurement produce more meaningful feedback
when used for formative assessment?
It achieves this by answering the following sub questions:
Q1A: What are the student’s and instructor’s attitudes and perceptions of assessment
with confidence when used for formative assessment?
Q1C: Does the use of assessment with confidence measurement provide additional
valuable feedback to the instructor when used for formative assessment?
The analysis of the simulation scores (quantitative data) in comparison to the
previously achieved grades for a select number of students answers of the research sub
question:
Q1B: How do the students results compare to the results of a standard Multiple-choice
Question (MCQ) test when using assessment with confidence measurement for
formative assessment?
170
CHAPTER 8 USING THE WEB-BASED MCQCM FOR SUMMATIVE ASSESSMENT
Chapter 7 presented the results of using the MCQCM as a formative assessment tool.
This trial yielded encouraging outcomes. The pilot programs described in Chapter 5
and the consequential design changes in Chapter 6 greatly assisted in the construction
of a satisfactory Web-based self-assessment tool offering full flexibility, immediate
feedback and a seemingly more honest appraisal of the state of knowledge of the
participant. It was apparent that the MCQCM offered a non-threatening environment
that both the students and instructors considered to be beneficial. The question at this
time is whether the MCQCM is an acceptable assessment strategy for summative
assessment and whether it could possibly offer a more discerning set of results?
Importantly, it must first be ascertained if the MCQCM is a valid, legitimate
assessment option, offering a level of reliability of equivalence to that of the more
traditional methods of assessment. This chapter considers the observations and results
of a series of exercises initiated to ascertain if the MCQCM could be used as a
summative assessment tool, both from the student’s and the instructor’s perceptive.
171
8.1 Initial Trials using MCQCM as a Summative Assessment Tool
To facilitate this trial the MCQCM was used as a primary revision tool for a group of
students throughout a semester followed by an MCQCM class test. The test was graded
using the traditional method of one mark for a correct answer in comparison to a
grading depending on the user’s registered confidence.
This study initially considers the validity of the MCQCM testing method, where
validity refers to “whether the question actually tests what it is purported to test”
(Schuwirth & Van Der Vleuten, 2006), achieved by comparing the correlations between
two methods of testing that are supposed to measure the same construct (Bacon, 2003),
in this case the MCQCM results against the traditional MCQ test results. The reliability
of any testing method is defined as the accuracy of which a score on a test is
determined, or more precisely, a score that a student obtains should indicate the score
that this student would obtain in any other given, equally difficult test, in the same field
(Schuwirth & Van Der Vleuten, 2006).
8.1.1 Setting
A cohort of 52 Data Communication students, a mixture of undergraduate and
postgraduate students doing various programs, were required to sit a test that
contributed to their final grade during the semester. The test consisted of 10 MCQs
testing the students on the fundamentals of network design. The author of the test was
mindful of Bloom’s (1956) taxonomy of educational objectives when constructing the
questions to facilitate the assessment of various levels, in particular testing at the
application level. The students sat the test under supervision during the tutorials.
They were instructed that they would be graded in two ways. Firstly using the
MCQCM technique where the registered confidence for each response would be
included in the mark and secondly using the traditional method. They were instructed
that the grade allocated to them for this assessment task would be the greater of the two.
This was done to alleviate the stress experienced by the students using a new grading
172
system and to give richer data when asking the students questions about their
perception of the testing style.
8.1.2 Results
The proportion of undergraduate and postgraduate students was not noted but estimated
by the lecturer as being approximately half. The gender balance was not even as the
area is traditionally more popular with males, being in this case 84 per cent males and
16 per cent females.
This results section has been divided into two areas for clarification purposes. The first
considers the data generated from the scores. This data was gathered then statistically
analysed for correlation, validity and reliability and appropriate conclusions are drawn.
The second analysis of data gathered to gauge the students’ and instructors’ perception
of using the MCQCM as a summative assessment tool.
Results for Section 1: Grade Comparisons for Correlation and Validity
This section considers the grades for each of the marking systems followed by a
comparison of the results evaluating the convergence of validity, the correlation and
reliability.
The average results for the MCQCM and the traditional MCQ are summarised in Table
8-1.
Test type and
difference between the two
Average Grade Standard Deviation
MCQCM 67.60% 24.80%
MCQ 60.58% 22.73%
MCQCM - MCQ 7.02% 20.23%
Table 8.-1: Average, Standard Deviation and Difference for Both Marking
Schemes.
On examination of the analysis presented in Table 8-1 it is noted that the average
grades and the standard deviation for the two marking schemes are reasonably close. It
is observed that the MCQCM has the greater Average Grade and Standard Deviation.
173
Instructors would be quite pleased with these outcomes at this stage, as the results
appear to be acceptably convergent.
When looking at the grades in more detail it is noticed that the difference for the
individual’s test score in some cases is quite extensive. Figure 8-1 graphs the two
grades for each individual clustered by the grades for the MCQ. It demonstrates the
spread of the results, showing the grade for the MCQCM marking scheme in some
cases being quite different than that of the gathered MCQ scheme.
Figure 8-1: MCQ and MCQCM Scores for Each Student with the MCQ
Clustered.
It is further observed that there is a relatively even distribution of those who benefited
from the MCQCM marking scheme (42%) and those who benefited from the MCQ
marking scheme (39%), while the remaining 19 per cent achieved the same mark. In
fact, further investigation found that of those who obtained a high score in the
traditional MCQ marking scheme (>65) only 32 per cent scored higher for the
MCQCM marking scheme with 52 per cent scoring lower and 16 per cent scoring the
same. Further, of the 10 students who achieved 90 per cent or more for the MCQ
grading scheme 6 of them scored less, 0 scored higher and 4 scored the same for the
MCQCM. This is an important observation, as the higher achieving students do not
necessarily score better using MCQCM. This suggests that the MCQCM might be a
better indicator of knowledge, in particular for those students who achieve higher
174
grades, but is in no way conclusive. However, it could also represent the level of
confidence they are prepared to register.
The result for the convergence validity for the correlation between the MCQCM and
MCQ scores supporting the hypothesis that there exists a correlation between the grade
for the MCQCM and the grade for the MCQ. It is apparent that there is a relatively
strong convergence of correlation of the two marking schemes as shown in Table 8-2.
Correlations
MCQCM MCQ
MCQCM Pearson Correlation 1 .629
(**)
Sig. (2-tailed) .000
N 52 52
** Correlation is significant at the 0.01 level (2-tailed).
Table 8-2: The Correlation for the Two Marking Schemes
This analysis confirms that there is convergence of validity for the MCQCM and MCQ,
with the correlation of .629 (p<.01). This result gains strength when considering the
calculated value of Cronbach’s Alpha reliability coefficient (.722, above the
recommended minimum of .70) for this set of results, demonstrating internal
consistency.
While the correlation supports both convergence of reliability, offering validation of the
usage of MCQCM as an alternative assessment task, it is by no means conclusive, as it
requires extensive further research to truly validate the hypothesis. The observed
possibility of the MCQCM offering a more discerning grading system also warrants
further investigation.
Results Section 2: Students’ Perception of using MCQCM for Summative
Assessment
The second component of the results concentrates on the students’ perception of
MCQCM as a summative assessment tool. This section deals with the responses of the
students during the support post-test surveys in the tutorials in an attempt to ascertain
how enthusiastic the students were towards the MCQCM (Appendix A). Most
importantly it attempts to evaluate their perception of how much control they felt with
175
regards to using the slide bar with the direct consequences of their actions being a
possible change in scores. It was considered important at this stage of the research that
there be an understanding of the level of students interactivity with the system. The
amount of the use of the slide bar and the reason for using the slide bar needed to be
ascertained to ensure that the cognitive process behind the decisional making activity
was representative of the student’s current state of knowledge.
It was observed that 70 per cent of students declared that they gained from being able to
use the slide bar to show their confidence while the remaining 30 per cent did not
consider it to offer any advantages. This was a pleasing result with a large majority of
those who acknowledged the gain supported their choice, stating that the MCQCM
permitted the attainment of marks for partial knowledge, identified problem areas to
both the marker and the student for further study and offered a good comfort zone if
you were unsure of the answer. The 30 per cent of students who did not feel that they
gained from the use of the slide bar commented that they considered that the gain was
not worth the effort, as well as being confusing and too difficult to use.
Further discussion found that 40 per cent of the group used the slide bar only as a
means of identifying the answer as being true or false while 60 per cent used it to
identify the answer as being true or false and register their confidence. This is an
interesting observation, as the students who acknowledged using the slide bar only as
registering their True/False choice is greater than those who did not feel that they
gained from the slide bar. Considering the supportive student dialogue it is apparent
that some of the students felt that even though they appreciated the option for using the
slide bar they did not use it under test conditions, as it was too taxing in the situation of
summative assessment as opposed to formative assessment.
Apart from a small group (6%) who used the slide bar the same for both the practice
formative assessment and the formal summative assessment, 47 per cent used the slide
bar less with equal proportion using it more. The students who used it less justified this
by saying that it was too difficult to use it extensively under test conditions due to the
extra cognitive load and it increased the stress level. They consider it to be a distraction,
not considering the gain worth the effort. In one case a student stated that they do not
like to gamble and would rather just choose an option outright. The students who used
176
it more during the summative assessment than for formative assessment stated that they
wanted to maximise their grade by minimising the loss of marks for answers they were
unsure of or alternatively increase their grade when sure of the answers.
Previous work of Farrell and Leung (2003) discusses the different learning approaches
of the individual and the advantages of offering assessment variations that consider the
personality traits of the users, in particular the introverted and extroverted users of the
system. A strong 72 per cent of the cohort registered that they are comfortable
registering 100 per cent for an answer if they are certain that it is correct, while the
remaining 26 per cent find it difficult to claim 100 per cent confidence even when they
are sure of the answer. This suggests that in a selected cohort of students it would be
expected that there would be some of them who would prefer to register a level of
confidence less than 100 per cent in a choice even if they are absolutely certain of the
answer.
A large 75 per cent of the group confirmed that they appreciated the opportunity to gain
some marks for partial knowledge. They further agreed that the system forces them to
think more carefully about their options. The remaining 26 per cent felt that they are
often either “know it or you don’t” and registering a confidence is just not committing.
8.1.3 Discussions and Conclusions
In conclusion, the results of the grade comparison component of this study has
identified a convergence of validity between the two types of grading schemes being
investigated, Multiple-choice Questions with Confidence Measurement (MCQCM) and
the traditional Multiple-choice Question (MCQ) format, for the subject of Information
Technology. Consequently the MCQCM appears to be an acceptable option to be
included in the suite of assessment tools available to the instructor. Previous work
(Farrell and Leung, 2002) has demonstrated that the MCQCM delivers a richer
feedback and guidance to the students when used as a formative assessment tool. In
addition they documented the perceived advantages of using it in preparation for exams
from the students’ point of view.
It is pleasing to observe that the grades do correlate and there appears to be an
interesting interaction with the upper achievers where the difference in the grades for
177
the MCQCM and the MCQ alternative could offer a richer grading system. Although
the evidence is not over-whelming, it is an interesting observation that the higher
achievers in the group do not score as well for the MCQCM. This could either be that
MCQCM forces the students to “show their hand” giving a true indication of their
knowledge or that it is really acting as a statement of their own personal confidence in
their choices. In light of this ongoing application it was recommended that the
MCQCM be adopted to increase the data gathered, investigating to see if this observed
results occur again and if so attempt to ascertain the reason.
The second set of results from the student survey revealed some interesting
observations. There was an overall support of the majority from the students that was
pleasing to the developer and instructor. The majority of students acknowledged the
benefits of the system, stating that they appreciated the opportunity to demonstrate
partial knowledge and optimise their grade by lessening the impact of an incorrect
choice and increasing the grade for a correct one. The confirmation that a proportion of
the cohort, 26 per cent in this case, do not have the confidence to register 100per cent
for any answer, even when they know it is absolutely correct, should be always
considered in the analysis of future observations, as it indicates the possibility that a
particular group of students will never be able to maximise their grade by using this
system. Another important observation is that 47 per cent of the students decrease the
amount they use the slide bar during summative assessment than when using it for
formative assessment during the semester, as they consider it to be a distraction or
perceive it not offer enough return on their assessment.
These encouraging results promoted the continuing utilisation of MCQCM as part of
the assessment tasks for the semester. The overall positive response from the students
towards MCQCM as both a formative and summative assessment tool increased the
enthusiasm of the designers and instructors who were keen to pursue its usage in the
classroom.
From the instructor’s position, an identified advantage of using the MCQCM as a
summative assessment tool was that it required the students to use it during the
semester for revision in preparation for the test. This apparent by-product of
introducing a new assessment strategy forces the students to actively revise the course
178
material as part of becoming familiar with the assessment strategy. The benefits of
placing a requirement of the students to know the MCQCM operation in most cases
greatly increases the likelihood of their success in the subject.
8.2 Comparative Analysis of using the MCQCM as a Summative
Assessment tool to the Traditional Short Answer, MCQ and Long
Answer Assessment
The question facing educators today is what methods of assessment should they be
using and what would be the appropriate mix to maximise the feedback and evaluation
process? Schuwirth and Van Der Vleuten (2006) consider a well designed assessment
strategy will incorporate various types of questions that are appropriate for the content
being assessed (Schuwirth & Van Der Vleuten, 2006). The options presently available
to the instructors include multiple-choice questions (MCQ), short answer questions
(SA), longer problem solving questions (PS), case study reports, presentations and
other equally effective and proven choices. In the majority of cases the final grade is
calculated by combining each separate mark from assessment tasks completed during
the subject. The utilisation of multiple assessment methods recognises the need to allow
students to demonstrate their knowledge in various methods throughout their learning
experience.
As previously identified MCQs are highly regarded by instructors (Bacon, 2003) and
consequently used extensively, with world wide experience in their construction
(Schuwirth & Van Der Vleuten, 2006). The Short Answer assessment format has equal
popularity as the MCQ alternative. Short answer assessment strategies can offer more
flexibility, with greater ability to test creativity and higher levels of Bloom’s (1956)
taxonomy of educational objectives, as outlined previously. However, short answers are
resource intensive when grading and are subject to poor reliability due to subjective
marking (Bacon, 2003).
The longer Problem Solving questions are often included in the final exam as they
permit the instructor to assess the highest of Bloom’s levels of taxonomy. The format
of these questions usually present the student with a scenario situation which requires
179
the student to call upon many aspects of the subject material to analyse, synthesise and
evaluate, offering alternatives in some situations. These are clearly more difficult to
grade consistently as there is often not a prescribed correct solution but a number of
equally valid alternatives.
The encouraging results of the previous study initiated further investigative work,
where it was recommended that an analysis be completed of the grades of students
using MCQCM be compared with the grades from more traditional modes of
assessment: Multiple-choice Questions, Short Answers and Problem Solving (Scenario)
Questions.
8.2.1 Method of Comparative Study
Including MCQCM questions in the final end of semester exam facilitated this part of
the research. The exam also contained MCQ questions, Short Answer questions and
Longer Scenario questions. A total of 43 students sat the final exam producing some
interesting results which consisted of an 8 Multiple-choice Question (MCQ) section
followed by 8 MCQCMs, 8 Short Answer Questions and 2 Longer Problem Solving
questions. The MCQ and MCQCM sections carried 20 per cent each of the final exam
grade, the short answers section carried 33 per cent while the longer problem solving
section the remaining 47 per cent. The exam questions were constructed with an
awareness of Bloom’s (1956) taxonomy of educational objectives facilitating the
assessment of various levels from recall to application. On the completion of the exam
the results were collated with each question’s mark carefully recorded for analysis.
8.2.2 Results
The averages and standard deviations are displayed in Table 8-3.
Section Average Grade Standard Deviation MCQ 73% 17.7%
MCQCM 67% 21.0%
Short Answers 85% 9.8%
Problem Solving 75% 14.5%
Table 8-3 :Means and Standard Deviations for Each of the Section of the Exam.
180
It can be observed in Table 8-3 that the average class grades for the various sections of
the paper are close, as too are most of the standard deviations. The short answers
section has the greater average grade with a smaller Standard Deviation. Instructors
would be quite pleased with these outcomes at this stage.
On further examination and analysis of the data it was found that in most cases there
appears to be a good relationship between the grades allocated for each of the sections
for the individual student. Again this is very pleasing for the instructor, as there appears
to be a good convergence for each of the assessment areas under consideration.
Educators rely on a reasonable convergence of the grades for each of the sections, as
any deviation from this is an area of concern. Failure to achieve this might indicate
poor question construction in a particular section. In this case there does not appear to
be any one area of concern.
At this stage analysis was applied to identify the statistical relationship between these
results. The correlation for the scores for each of the sections was used to test the
convergent validity, using Spearman’s Rank Order correlation test.
The comparative results are displayed in Table 8-4: Correlations
MCQ PS MCQCM
Spearman's rho PS Correlation Coefficient .235
Sig. (2-tailed) .129
N 43
MCQCM Correlation Coefficient .436(**) .302(*)
Sig. (2-tailed) .003 .049
N 43 43
SA Correlation Coefficient .447(**) .442(**) .544(**)
Sig. (2-tailed) .003 .003 .000
N 43 43 43
** Correlation is significant at the 0.01 level (2-tailed).
* Correlation is significant at the 0.05 level (2-tailed).
Table 8-4: Correlation Table for the Sections of the Exam.
The following observations can now be discussed.
Firstly, let us consider the correlation between the MCQCM and the other sections of
the exam paper.
181
There is a reasonably strong correlation between the MCQCM and the SA section
(r=.544, n=43, p<.01).
MCQCM also has a medium correlation with MCQ and PS (r=.436, n=43, p<.01 and
r=.302, n=43, p<.05) respectively.
These statistics confirm that there is a convergence of validity for the MCQCM and all
of the other sections of the exam. Additionally, these correlations gain strength when
considering the Cronbach’s Alpha reliability coefficient for the results, demonstrating
the internal consistency of .7, equal to the recommended minimum.
Further, it is interesting to see that the grades for the MCQ section demonstrate a
medium correlation to SA (r=.447, n=43, p<.01) and a small correlation to PS (r=.235,
n=43, p<05).
However SA and PS has a stronger correlation (r=.442, n=43, p<.01).
8.2.3 Discussions and Conclusions
In conclusion, this study has identified a convergence of validity between MCQCM and
all of the other sections of the exam paper, with the strongest correlation being between
MCQCM and short answers. This observation is very encouraging as the MCQCM was
primarily designed as a formative assessment tool to support the learner along the
learning path (Farrell & Leung, 2002b).
Interestingly, the traditional MCQ section of the paper has medium correlation with the
short answers and smaller correlation to the problem solving section. Hence, whilst
there is convergence of validity between MCQ and short answers there is no significant
convergence of validity between the MCQ section and the problem solving section.
This means that a good performance in either the MCQ or problem solving section
would not necessarily predict a good performance in the other.
As a result of these initial observations MCQCM appears to be a valid assessment
option, producing grades that have equal reliability as the more traditional methods of
assessment. However, MCQCM does not appear to offer any great advantage over the
rest of the methods of summative assessment. The question then must be asked is the
MCQCM a worthwhile strategy for summative assessment?
182
This study encouraged the utilisation of the MCQCM as a summative testing option for
future investigation. It was proposed that the tool continued to be used as a both
formative assessment method for the duration of the semester and for summative
assessment, to be included as part of the final exam permitting further investigation to
ascertain the students’ acceptance or rejection of MCQCM as a standard method for
summative assessment.
8.3 Comparative Analysis of using the MCQCM and Traditional
MCQ as a Summative Assessment Tool
As yet unpublished findings of Farrell and Leung provide more evidence of the merit of
the MCQCM as a summative assessment tool in another field study. In this instance a
cohort of 86 students were required to complete a mid-semester test which incorporated
both MCQCM and MCQ questions.
8.3.1 Method
One of the main challenges of having students use the MCQCM in a controlled
environment is that you cannot ethically advantage or disadvantage a particular group
of students during their studies by setting up a control group for students from the same
cohort. In order to do a comparative study it is recommended that this be emulated by
providing variations in the structure of the test. In this case a cohort of 85 students was
evenly split into two groups. The first groups test consisted of questions 1 to 10 in the
MCQCM format and questions 11 to 20 in the traditional MCQ format. Alternatively,
the second group of students’ tests reversed this arrangement, with the questions 1 to 10
being the MCQ’s and 11 to 20 being the MCQCM’s.
8.3.2 Results
This combination of splitting the cohort of 85 students into two similarly sized groups
provided an interesting set of data for analysis, both for comparison of each group of
students and then across the groups. Figure 8-2 demonstrates the spread of the final
183
results for the students with the students being clustered around their MCQ result from
highest down to lowest. The corresponding individual’s MCQCM result is then plotted
against them for direct comparison.
Figure 8-2: The Student’s MCQ (clustered ascending order) and MCQCM Scores.
An interesting observation is the variation of the results for the upper and lower
achievers. It is noted that 89 per cent of the students whose MCQ result lie in the upper
region (> 80%) achieved less for their corresponding MCQCM score, while 11 per cent
scored higher. In direct contrast, only 35 per cent of the students whose total MCQ
result was in the lower quartile (< 50%) achieved less for their corresponding MCQCM
score and 65 per cent achieved a higher score. This is a telling observation as it
demonstrates a trend worth investigating.
The statistical analysis of the data, as seen in Table 8-5, further demonstrates that the
MCQCM results have a convergence to the traditional MCQ results, which validates
the use of it for summative assessment.
As in previous field trials, the convergence of validity between the results for the MCQ
and MCQCM sections of the tests is acceptable (r= .761, n= 85, p< .01), reinforcing the
validation of the MCQCM as being a reliable testing method. Again, this correlation is
stronger when considering Cronbach’s Alpha reliability coefficient, demonstrating
internal consistency, of .856, comfortably above the recommended minimum of .7.
184
Correlations
MCQ MCQCM
Pearson Correlation 1 .761**
Sig. (2-tailed) .000
MCQ
N 85 85 Pearson Correlation .761** 1
Sig. (2-tailed) .000
MCQCM
N 85 85 **. Correlation is significant at the 0.01 level (2-tailed).
Table 8-5 Correlation of MCQ with MCQCM.
This statistical analysis across the groups confirmed the previously observations,
confirming the MCQCM as an acceptable testing option that is statistically as reliable
as the traditional MCQ option.
Further, it is noticed that Chi-Squared test, shown in Table 8-6, confirms these results
for the same groups of questions to be of the same population, again reinforcing the
legitimacy of the application.
Test Statistics
MCQ MCQCM Chi-Square 33.518a 54.412b
df 45 9
Asymp. Sig. .896 .000
a. 46 cells (100.0%) have expected frequencies less than 5. The minimum expected cell frequency is 1.8. b. 0 cells (.0%) have expected frequencies less than 5. The minimum expected cell frequency is 8.5.
Table 8-6: Chi-Square MCQ to MCQCM.
185
8.3.3 Discussions and Conclusions
The observations and analysis contained in Table 8-5 and Table 8-6 confirm the
MCQCM to have equivalent merit as the traditional MCQ assessment option, offering
good correlation and convergence. The results identify the MCQCM as an alternative
assessment strategy offering reliability, validity and convergence to standard
assessment policies.
8.4 Instructor’s Focus Group for Formative Assessment
As for formative assessment, a focus group was facilitated to ascertain the instructor’s
attitudes and perceptions of the MCQCM for summative assessment. The four
instructors involved in the above MCQCM implementations were invited to give their
feedback to the designers yielding the following observations.
All of the instructors complimented the visual presentation of both the MCQCM
question format and the student feedback screens. Consistent with the instructor’s
feedback for the formative assessment applications the instructors appreciated the
analogy to the games environment and the clarity of the feedback produced for the
students. They also complimented the MCQCM in its method of handling graphics and
the graphs produced showing the areas of comprehension and misunderstandings.
In particular the instructors appreciated the increased distribution of the student’s scores
as they felt that the MCQCM could offer greater opportunity to produce a more
discerning set of results. However, they voiced concern that the results might only serve
to reflect the propensity of the student towards over or understating their confidence
rather than truly identify the level of knowledge.
The main concern of the instructors was the implementation of assessment that
penalises the students with negative scores. While they all acknowledge that the
approach was reasonable and that there is a need to highlight areas of knowledge where
a student registers high confidence in incorrect answers, they feared that the students
could perceive the practice as being unfair, resulting in students questioning the scoring
and appealing for unjust assessment strategies.
186
All of the instructors felt that the implementation of an innovative assessment tool has
the additional benefit of forcing the students to revise before the final assessment,
where they are required to practice with the MCQCM in preparation. As a result
students who would not normally prepare may do so, improving their understanding
and hopefully catching those students who would normally slip through.
8.5 Discussion
The first trial discussed in this chapter was initiated to validate the MCQCM as a
summative assessment tool, ascertaining the acceptance of the MCQCM by the students
and the instructor as part of the assessment regime. As the results of this exercise
demonstrate from the survey (Appendix A), the majority of students considered the
MCQCM to offer greater opportunity to optimise their score, by lessening the impact of
an incorrect answer and increasing the reward for a correct one. Additionally, they
appreciated the chance to demonstrate some partial knowledge in areas, a feature that is
not possible in many traditional testing methods. The survey results also demonstrated
that for the majority the use of confidence measurement in assessment is a means of
identifying gaps in their learning and furthermore offered a comfort zone if unsure of
an answer. The statistical analysis of the results shows a good correlation, which should
be expected from an exercise where dual marking has been applied. The interesting
observation is in the upper quartile of student results where the higher achievers do not
score as well when the MCQCM scoring system is applied possibly offering a more
discerning set of results than just being clustered around the 90 to 100 per cent area.
The decrease in the use of the slide bar for summative assessment tasks demonstrates
the change in behaviour when there is more to lose or gain. This issue highlights the
need for the MCQCM to be extensively used throughout the semester so that the
students can familiarise themselves with the system.
The second trial considered in this chapter is the comparison of the MCQCM results to
a number of traditional testing methods as part of the subject’s final exam, The
statistical analysis of the results of the MCQCM section of the exam against those
attained for each individuals achievement for other traditional methods of assessment
187
returned some positive observations. From this study it is apparent that the MCQCM is
as reliable a testing method as the similarly constructed traditional MCQ method. The
convergence of the MCQCM and MCQ results is strong, offering an acceptable
reliability. The main observation is that the MCQCM showed similar convergence to
all of the assessment methods tested, equivalent to the levels of convergence of each of
the traditional methods of MCQ, Short Answers and Long answers showed to each
other. This validates the MCQCM as an equivalent predictor of a student’s knowledge.
Importantly, the feedback from the students endorsed the benefit of using the MCQCM
as a revision tool in preparation for the final assessment, exposing the students to
critical self-assessment of their knowledge, which they acknowledged as greatly
assisting them in preparation for the final exam.
The third trial outlined in this chapter, in which the MCQCM was used as an
assessment for half of the test and the traditional MCQ being used for the remainder,
again demonstrated a convergence of results and acceptable reliability. It was pleasing
to see that the results statistically came from the same population, confirmed by the
Chi-Squared test confirmed. However, the main contribution to this research can be
observed in the results of the upper and lower quartiles, where the MCQCM scores for
the higher MCQ scores are in the majority lower than their corresponding MCQ scores,
and equally important, the MCQCM scores of the lower MCQ scores are in the
majority higher.
8.6 Summary
This chapter has reported on the findings of implementing the MCQCM as a
summative assessment tool during the subject delivery as a means of contributing to the
grading of the student, formally recognising their level of achievement for the given
subject(s). It has achieved this by including the MCQCM into the summative
assessment suite along with other traditional assessment strategies. This permitted the
comparison of the results to ascertain if the MCQCM scores offered the reliability and
validity of the other assessment strategies used. In doing so this chapter answers the
188
research question formulated in Chapter 2 pertaining to the application of assessment
with confidence measurement for summative assessment purposes, being:
Research Question 2.
Does Assessment with Confidence Measurement offer equivalent Validity and
Reliability compared to traditional assessment strategies when used for
Summative assessment?
It achieves this by answering the following sub question designed to ascertain the
instructors’ and students’ perception of assessment with confidence measurement:
Q2A: What are the student’s and instructor’s attitudes and perceptions of assessment
with confidence measurement when used for summative assessment?
It further addresses the sub questions regarding the validity and reliability of
assessment with confidence measurement:
Q2B: How do the results compare in validity and reliability to the results of the
standard MCQ test when using assessment with confidence measurement for
summative assessment?
Q2C: How do the results when using assessment with confidence measurement for
summative assessment compare in Validity and Reliability to other traditional
methods of summative assessment?
The final chapter, Chapter 9, discusses at length the findings of this thesis. It also
recapitulates the work contained in previous chapters, including the literature review,
the research methodology framework, the design and redesigning of the MCQCM, the
association of the MCCM to the game play topology and the mathematical foundation
by which the MCQCM scoring method was based.
189
CHAPTER 9
SUMMARY,
CONCLUSION AND
FUTURE WORK
The findings of this research are summarised in this chapter, with its significance and
contribution to the field. It achieves this by initially recapitulating on the work
contained in the previous chapters followed by discussion on the findings of the trials in
the field. It also addresses the research questions posed in Chapter 2. The chapter
closes with the conclusions drawn, identified limitations and proposes possible further
direction for investigation.
190
9.1 Summary of the Research
This research was initiated to address the identified area of concern voiced by many
educators, that assessment which encourages guessing, and in most case rewards a
student for it, falls short of its primary objective of reflecting the student’s present level
of knowledge and providing meaningful feedback encouraging self-reflection and
consequential adjustment to the learning path. In addressing this fundamental concern it
also tackles the additional identified concern of reinforcing incorrect knowledge in the
students mind and failing to recognise partial knowledge. As a means of achieving a
solution to these areas of concern this research investigates and evaluates the option of
incorporating confidence measurement into the assessment strategy to improve the
grading of the students’ knowledge and feedback. This research then proposes the
Multiple-choice Questions with Confidence Measurement (MCQCM) interactive Web-
based assessment tool as an alternative to the more traditional MCQ method of
assessment. It then further evaluates the utilisation of the MCQCM to ascertain if it is
beneficial to both the students and the instructor as an assessment strategy for
incorporation into the classroom activities. Initially it focuses on the MCQCM
application for formative assessment, then as a result of promising feedback extending
its application for summative assessment applications, addressing these areas of
application for formative and summative assessment separately.
This research contributes to the educational community with its evaluation and
endorsement of assessment with confidence measurement, in turn assisting educators
who wish to pursue it for inclusion in their assessment strategies. The discussion of the
alternative scoring techniques developed and refined by others who have implemented
similar approaches to assessment offer options to those instructors wishing to pursue
innovative assessment strategies. These documented scoring methods differ
significantly depending on the various cohorts of students, recognising the students’
limitations or taking full advantage of their advanced academic abilities, depending on
the cohort’s different socioeconomic, cultural and intellectual backgrounds. It is the
191
responsibility of the educators to provide meaningful assessment that identifies the
student’s level of achievement as precisely as possible. Some educators consider
assessment that fails to truly indicate a student’s true level of knowledge as bordering
on negligence on the instructor’s behalf. Others vehemently condemn the use of
negative assessment in any shape or form, asserting that the practice can produce
adverse affects on the individual, detrimental to their progress along the learning path.
This research demonstrates a broad acceptance of assessment with confidence
measurement by both students and instructors.
This research paves the way for further studies in the area of innovative assessment
with confidence measurement. It is envisaged that future incarnations of the MCQCM
will include provision for variable scoring regimes, the same that have been practiced
by others, offering flexibility in its applications. These scoring alternatives could either
permit the setting of the scoring choice for the duration of the subject’s delivery or be
adjusted by the instructor in an attempt to increase the intensity, applying more severe
penalties to students who demonstrate high levels of confidence for incorrect answers
towards the end of the delivery schedule. The gaming metaphor adopted for the
interactive application produces intrinsic motivational activity, encouraging practices
and rehearsal through casual use, supporting the transference knowledge from short-
term to long-term memory, which is fundamental to students learning. In keeping with
the games phenomenon, future development will continue to preserve the features of
game play, promoting fairness, posing challenges with appropriate rewards,
encouraging risk-taking with explorative activity and controlling the level of difficulty
and the corresponding stress. Whether used for formative assessment alone or
implemented for summative purposes that require formative assessment activity to
support it, this research identifies the advantages of assessment with confidence
measurement, exposing the student to a self-critical process that increases the
likelihood of correct indication of their level of knowledge, of the benefit to both the
instructor and the students.
This research culminates in a series of recommendations to be considered if embarking
on the use of a computer aided assessment strategy incorporating confidence
192
measurement and importantly presents alternative methods of scoring that can be
considered.
9.2 Recapitulating on Previous Chapters
In Chapter 1 of this thesis, the research problem statement was formulated, based on the
discussion surrounding the identified concern that existing traditional assessment
methods, such as Multiple-choice Questions, often do not comply with the criteria of
good assessment practice by failing to indicate the true level of knowledge of the
participant. Furthermore, by the nature of their construction, many traditional
assessment strategies encourage the individual to guess, rewarding them for their
efforts. Chapter 1 also identified another major concern being that our traditional MCQ
assessment options often do not encourage the participant to demonstrate their various
levels of knowledge, as they require the student to identify the answer as being either
correct or incorrect (black or white), not permitting the student to demonstrate partial
knowledge, the “shades of grey” or “fuzzier” areas (Diamond & Forrester, 1983).
The choice of an appropriate Research Methodology plays a vital role in any research
direction and structure, as outlined in Chapter 2, where the research framework was
defined. It achieved this by firstly recognising the contributions of traditional research
paradigms often used by researchers, being Positivism, Interpretivism and Critical
Theory, focusing this research to achieve its objectives. In Chapter 2 the most
appropriate approach to address the research questions was then formulated, being a
blend of the afore mentioned traditional research methodologies in order to use both
quantitative and qualitative analysis of the generated data. At this time the problem
statement of Chapter 1 was used as a basis to construct the research questions. Chapter
2 then considered each of these research questions in light of the research paradigms
and identified the research framework for the research questions to be addressed.
Chapter 2 then closed with a discussion about the approach to problem solving in the
real world, adopting the Human Computer Interaction (HCI) iterative approach to
problem solving (Sharp et al., 2007).
193
The literature review in Chapter 3 discussed variations of non-conventional MCQ
assessment strategies that have been used in the past and at present. Initially this
Chapter discussed the various Learning Theories and Learning Styles, identifying the
importance of feedback in the learning process. It was then recognised that the
embracing technology provided the ideal environment for educators to pursue their
educational interests, developing and refining innovative assessment practices to
greatly enhance the learning, encouraged by the perceived commercial opportunities
spurring on the rapid development in the field. Furthermore it was acknowledged that
the instructors today have many assessment options available to them and that it is the
responsibility of instructors to choose a combination of assessment tools and that it is
imperative that formative assessment activities are made available for the duration of
the learning experience. The chapter then discussed the utilisation of the MCQ
assessment method and its suitability to the new technology, often pushing its
application beyond the scope of its original design. The works of others in their attempt
to eliminate guessing and reward students for partial knowledge was discussed and the
MCQCM is reliant on the underlying core arguments provided in the cited previous
work. Chapter 3 continued by discussing the benefits of providing the opportunity for
learners to reflect on their present state of knowledge and the important role that
computer aided assessment learning tools have on the learning process, as outlined by
Hede’s (2002) model of Integrated Effectiveness of Multimedia on Learning.
Chapter 4 considers the mathematics underpinning various scoring systems adopted for
innovative assessment strategies. It initially addressed the issue of guessing in MCQ
assessment by introducing the work of Pollard (1985,1986,1993), a pioneer in the field,
creating a scoring method that penalised for incorrect choices. Another scoring
technique (Paul, 1994) that used confidence as a contributor to the student’s grade
where the scoring is based on a logarithmic relationship derived from probability and
game theory. The more recent work of Gardner-Medwin and Gahan (2003) and
Gardner-Medwin (2006) that uses a harsh penalty system for students who demonstrate
high confidence in an incorrect answer but offers a safe zone for students willing to
concede that they have very little knowledge in the area was presented. Chapter 4
closed with a discussion of the MCQCM scoring technique.
194
Chapter 5 summarised the development of the MCQCM prototype and the initial pilot
trials. Chapter 5 presented the outline of a small pilot program initiated in the early
stages of development followed by the analysis, summary and conclusions. The details
of a second, more extensive pilot program are then summarised in which the findings
are analysed and evaluated. Chapter 5 continued with a summary from the results of
these two pilot programs that received encouraging feedback providing inspiration for
further investigation and research. Chapter 5 identified elements of design and
functionality that needed addressing, in particular the major concern of operation of the
MCQCM, where the mechanism of identifying an option to be correct and declaring the
level of confidence is completed in one action. Chapter 5 addressed the first research
question: Does Assessment with Confidence Measurement produce more meaningful
feedback when used for formative assessment? This was achieved by answering
relevant sub questions for formative assessment formulated in Chapter 2 being: What
are the student’s attitudes and perceptions of assessment with confidence when used for
formative assessment?; How do the students’ results compare to the results of a
standard Multiple-choice Question (MCQ) test when using assessment with confidence
measurement for formative assessment? and Does the use of assessment with
confidence measurement provide additional valuable feedback to the instructor when
used for formative assessment?
Chapter 6 contained an extensive discussion on the redesigning and refinement of the
MCQCM, addressing many of the issues acknowledged in Chapter 5, while also
adhering to customised heuristics developed by Sim, Read and Holifield (2008).
Chapter 6 opened with an investigation into the games paradigm, then described the
fundamental elements that constitute a game play experience identifying the
components and the experiences that are required to produce a balanced game.
Following that it addressed these elements in light of the MCQCM application that uses
the metaphor of gaming, identifying the areas of goals, risk and reward, fairness,
challenges, learnablility, stress and level of difficulty. In addressing the problem of
using the one mechanism to identify an option to be correct while also declaring the
level of confidence Bandura’s (1983) self-efficacy work was called upon to formulate a
solution. The Chapter then continued by comparing the usability of the MCQCM
195
against Sim, Read and Holifield’s (2008) heuristics for Web-based assessment
applications, producing a set of extended heuristics for MCQ with confidence
measurement. Chapter 6 closed with a discussion about the MCQCMs unique solution
for handling graphics by considering the work of Leung (1995) in his application of
bifocal display for large display requirements on small screens. Chapter 6 answered
the third research question:
What are the design requirements for developing an interactive assessment with
confidence measurement to ensure that instructors and students are able to achieve
maximum benefit from its application?
Chapter 7 reported on the findings of an investigative study designed to evaluate the
MCQCM against a traditional computer based assessment package for a formative
assessment exercise. The Chapter initially outlined a small simulation exercise designed
to indicate if the MCQCM produces the results that it purports to, in preparation for
extended application in the field. Chapter 7 detailed the application of the MCQCM as
a formative assessment tool used for the duration of a semester, which returned
encouraging results after the analysis and summary of the data. Chapter 7 then
discussed the instructors’ contribution to the feedback as they stated that they felt
comfortable about making the MCQCM system available for student self-assessment
and furthermore appreciated the additional analytical feedback given, permitting them
to evaluate the effectiveness of their teaching strategies. Chapter 7 answered the second
research question:
Does Assessment with Confidence Measurement produce more meaningful feedback
when used for formative assessment? This was achieved by answering relevant sub
questions for formative assessment formulated in Chapter 2 being: What are the
student’s attitudes and perceptions of assessment with confidence when used for
formative assessment? ; How do the student’s results compare to the results of a
standard Multiple-choice Question (MCQ) test when using assessment with confidence
measurement for formative assessment? and Does the use of assessment with
confidence measurement provide additional valuable feedback to the instructor when
used for formative assessment?.
196
In Chapter 8 the application of the MCQCM for summative assessment was trialed
following the positive responses from the work in Chapter 7. The chapter discussed the
first application where the MCQCM was used for both formative and summative
assessment along with the traditional MCQ assessment strategy. The resulting grades
for each individual were compared and statistically analysed to reveal a convergence of
reliability.
The second study contained within Chapter 8 compared the individuals MCQ, Short
Answers, Long Answers of Problem Solving and the MCQCM grades from a cohort of
students. The statistical analysis concluded that the MCQCM had equal correlation to
that of all of the others and can be considered equally reliable. The final field study in
this chapter compared the individual’s MCQCM and MCQ results for a summative
assessment event but used a more reliable method for evaluation. It produced results
demonstrating a convergence of validity between the two testing methods and an
interesting set of result for the upper and lower quartile of students.
Chapter 8 addressed the second research question: Does Assessment with Confidence
Measurement offer at least equivalent Validity and Reliability compared to traditional
assessment strategies when used for Summative assessment? This was achieved by
answering relevant sub questions for summative assessment formulated in Chapter 2
being: What are the student’s and instructor’s attitudes and perceptions of assessment
with confidence measurement when used for summative assessment? , How do the
results compare in validity and reliability to the results of the standard MCQ test when
using assessment with confidence measurement for summative assessment? , How do
the results when using assessment with confidence measurement for summative
assessment compare in Validity and Reliability to other traditional methods of
summative assessment? and Does the use of assessment with confidence measurement
provide additional valuable feedback to the instructor when used for summative
assessment?
197
9.3 Discussion
This research positions itself in the centre of human activity as it attempts to address an
issue embedded in the educational arena. Consequently it primarily deals with human
players, being the students and instructors. As a result of being engulfed in a real world
phenomenon the identification of the relevant problem space with its parallel research
problem statement marries together the major contributors to this research, being
Human Computer Interaction (HCI), for the iterative approach to problem solving of
the problem space and components of Positivism and Interpretivism contributing to the
research framework. These distinctly different discipline approaches work in harmony
for this particular research activity, finding the balance between the theoretical and
practical approaches to solving a real world problem in context. The HCI iterative
approach encourages the development and refinement of the MCQCM tool conforming
to recognised HCI guidelines, best practices and recent contributions in the field from
Sim, Read and Holifield (2008). The blended research methodology approach permits
the research questions to be formulated at the beginning of each stage where the
activities are tailored to produce the correct data in an attempt to enlighten the
researchers. This works harmoniously with the HCI iterative activities.
This section will discuss and answer the research questions formulated as part of the
research framework identified in Chapter 2. It will further demonstrate the value of the
MCQCM as a solution to the overall encompassing problem that present assessment
strategies employed by educators often fail to accurately demonstrate a student’s
present level of knowledge in a given area, being detrimental to both the student and
the instructor.
9.3.1 MCQCM as a Valuable Formative Assessment Tool
Question 1, as formulated in Chapter 2 of this research, asks, Does Assessment with
Confidence Measurement produce more meaningful feedback when used for formative
assessment?
This research set about to answer this first question by breaking it down into smaller
sub questions, permitting a concentrated focus on the underlying concerns. Any
198
assessment strategy is reliant on the acceptance of both the instructor and the student.
As discussed in Chapter 6, the perception of fairness has a fundamental place in our
society, as unfair practice in all aspects of our orderly life is immediately rejected.
Consequently, the students and instructors perception of assessment with confidence
measurement is an integral part of this research.
The corresponding first sub question formulated was Q1A: What are the student’s and
instructor’s attitudes and perceptions of assessment with confidence when used for
formative assessment? The focus of the early stages of this research was designed to
address this issue and gauge the acceptance of assessment with confidence
measurement by the potential participants. As presented in Chapter 5, this was achieved
by a series of small pilot programs specifically designed to ascertain the overall
acceptance. The prototype version of the MCQCM was introduced to an initially
restricted cohort of students, then to an extended group. Although this version of the
MCQCM had limited functionality it was at an acceptable operational level for the
purpose. These initial pilot programs returned encouraging results, supporting the use
of the MCQCM as an assessment strategy demonstrating by the students’ responses that
assessment with confidence measurement has a positive contribution to self-
assessment.
The second sub question in support of the first research question outlined above that
this research has addressed is Q1B: How do the students results compare to the results
of a standard Multiple-choice Question (MCQ) test when using assessment with
confidence measurement for formative assessment? This question is answered in
Chapter 7 with the application of the MCQCM for formative assessment, where Section
7.3 demonstrates the MCQCM to be equally valid with traditional assessment and
offering a reliable assessment, with a convergence to the results from using other
traditional testing options. Additionally, the results of the simulation exercise, where a
select number of students used the MCQCM after receiving their final grade for the
semester by other various summative assessment, demonstrated the MCQCM
consistently represented the equivalent level of knowledge compared to those recorded
from the individual’s other summative activities.
199
The third sub question in support of the overriding first research question regarding the
perceived value of formative assessment with confidence measurement is Q1C: Does
the use of assessment with confidence measurement provide additional valuable
feedback to the instructor when used for formative assessment? Assessment strategies
have two main stakeholders, students and instructors. It is not sufficient to only
consider the opinions of the students as the true value of an approach to assessment is
reliant on its worth to all interested parties, in this case the instructors. In Chapter 5 the
pilot programs produced some interesting findings relevant to the instructors when
considering the application of assessment with confidence measurement for formative
assessment. The resulting analysis revealed that instructors appreciated the increased
distribution of marks, offering a richer indication of the areas of attained knowledge.
Important to the instructors was the ability for them to identify areas in which a large
number of students demonstrate high levels of confidence for incorrect answers,
possibly indicating poor understanding in particular areas of the content. When
instructors were confronted with this type of data they recognised the need to re-
evaluate their teaching program to address areas of that appear to be misunderstood, or
more importantly, a demonstration of a strong belief in incorrect facts. During the first
pilot program the instructors confirmed that the opportunity to review the data as
presented by the system would permit them to review the effectiveness of their teaching
strategies in the highlighted areas of concern. They also voiced their concerns on the
possible negative effect of the use of assessment that applied negative grades for
incorrect answers, as the approach was foreign to them and they were unsure of the
reception they would receive from the students when proposing such assessment. In
addition they registered reservations about any assessment that is dependant on the
individual registering their level of confidence to gain or lose marks, as the practice
could favour the more extraverted student, who is somewhat brash in their approach to
their level of knowledge and perhaps tending to be over-confident. Correspondingly,
possibly handicapping the introverted, modest or less confident individual who might
understate their level of knowledge.
It was conclusive from the studies in Chapters 5 and 7 that all stakeholders, students
and instructors consider the MCQCM or in general the delivery of MCQ with a
200
confidence measurement to produce more meaningful feedback when used for
formative assessment.
9.3.2 MCQCM as a Summative Assessment Tool
Question 2, as formulated in Chapter 2 of this research, asks, Does Assessment with
Confidence Measurement offer at least equivalent Validity and Reliability compared to
traditional assessment strategies when used for Summative assessment?
Similarly to the approach for first research question, this research addresses this
question by formulating four sub questions.
As previously stated, the instructor’s and student’s perception of an assessment method
is critical to its acceptance. In the case of summative assessment this is of greater
concern. No instructor will use a system unless it conforms to both their own and the
institutions ethos of fair, unbiased assessment. Likewise, students will not support
assessment strategies they consider to unfairly prejudice or handicap individuals. As
Sim, Read and Holifield (2008) stated that ultimately the students are the one’s that
have the most to lose.
The resulting first of these sub questions is Q2A: What are the student’s and
instructor’s attitudes and perceptions of assessment with confidence measurement
when used for summative assessment? This research ascertained both the student’s and
instructor’s perception of summative assessment with confidence measurement by
using the MCQCM for various summative assessment activities. Chapter 8 documents
three distinct applications of the MCQCM as either stand-alone assessment or a
contributor as part of a suite of assessment activities. The feedback from the students
and the instructors after the first of the activities was analysed, as is presented in
Chapter 8. The ability to control the gain or loss via the level of confidence was
appreciated by the majority of students. It was pleasing to observe that some students
welcomed the opportunity to demonstrate partial knowledge, being rewarded
accordingly, while controlling the level of penalty for incorrect answers. It must be
noted that not all felt this way, as some expressed the opinion that being required to
nominate a level of confidence was interfering to the primary objective, a distraction
and an additional burden, not at all appropriate during formal test conditions. Others
201
also voiced hesitation in the declaration of absolute confidence (100%), as they prefer
to understate their level, in contrast to those who brashly overstate as part of their
personality. The instructors involved in this first summative assessment exercise
returned similar feedback to that recorded for the formative assessment, in that they
appreciated the results display highlighting areas of content misunderstanding
especially with high levels of confidence for incorrect answers, and furthermore
declared that this would influence their instructional direction resulting in the revisiting
of these content areas. The instructors continued to record concerns about the tendency
for the assessment with confidence measurement to be biased towards extraverted
students and possibly handicap the introverted. They also have reservations in using
implementing the MCQCM as the difficulty of constructing questions that offer
multiple correct answers for the one question is challenging, well beyond the demand
of traditional MCQ tests.
Q2B: How do the results compare in validity and reliability to those of the standard
MCQ test when using assessment with confidence measurement for summative
assessment? The MCQCM summative assessment implementations outlined in Chapter
8 produced statistical analysis of the recorded individual scores for MCQ and MCQCM
on three separate occasions using various mechanisms for comparative studies. In all
cases the results were pleasing as significant convergence of the MCQ and MCQCM
results, strengthened by the Cronbach’s Alpha Reliability Coefficient, which
demonstrated that the MCQCM scoring method consistently reflected the student’s
knowledge across the two assessments. The final case outlined in Chapter 8 directly
compared the MCQCM to the MCQ testing procedure, returning a Chi-Squared result
confirming that the two groups of MCQCM and MCQ scores were from the same
population. This evidence substantiates Q2B.
Q2C: How do the results when using assessment with confidence measurement for
summative assessment compare in Validity and Reliability to other traditional methods
of summative assessment? To answer this question Chapter 8 embarked on comparative
studies, where students were required to complete a number of various assessment
tasks, one of them incorporating assessment with confidence measurement. The
analysis demonstrated that the MCQCM was equally reliable to the other traditional
202
assessment strategies employed, proving it to have equivalent validity as a predictor of
student’s knowledge. This is demonstrated by the correlations calculated in Chapter 8
where convergence of results of the MCQCM to all of the other assessment methods
are close to those to each other, and in some cases stronger. This is shown where the
correlation between MCQCM and the short answers is the greatest. This study
concluded that the MCQCM compared equally to other traditional methods for
summative assessment.
Q2D Does the use of assessment with confidence measurement provide additional
valuable feedback to the instructor when used for summative assessment? As in the
discussion above the findings for this area of the research were constantly pleasing as
the instructors confirmed their general positive appraisal of assessment with confidence
measurement. The recurring theme from the instructors of the opportunity to identify
areas where high levels of confidence were being registered for incorrect answers
would “set off alarm bells”, resulting in adjustments to the presentation schedule to
revisit that content area. This was of particular importance for the first cohort of
students as the assessment was during the semester permitting the instructor to address
the identified shortcomings during the present students study program, which did occur.
In later instances of the MCQCM application in this research the benefits could only be
for the next cohort of students as it was used as part of the final assessment. The results
concluded that the MCQCM provided additional valuable feedback for the assessor for
summative assessment, however often the benefit would be for successive students
only.
From the results of the four sub questions it can be concluded that in response to
question 2 pertaining to the MCQCM being of at least equal value to students and
instructors and of consequential value to future students through improved feedback of
identification of gaps in knowledge.
203
9.4 Ethical Issues
This research is reliant on the data gathered from the real world, as well as
contributions from simulations and pilot programs more aligned with laboratory
experiments during the early design stages. Consequently consideration must be given
to the associated individuals. It is usual practice for assessment strategies to be analysed
post results for the identification of questions that were consistently answered well or
inadequately. Instructors often statistically process their students’ scores in an attempt
to ascertain if there are areas of content requiring attention, or most importantly if the
construction of particular questions are sub standard, needing rewriting for future use. It
is also expected that post assessment activity permits the students and instructors to
give feedback after the event, where the instructor will often go over the test results
highlighting the sections done poorly and those where the students achieved high
scores. Included in this exercise is the opportunity for the student to give feedback on
the question’s construction, possible ambiguous or poorly worded questions, which
may be misinterpreted by the candidates. This research has leveraged off these standard
teaching experiences, processing the data responsibly and privately as part of the
standard operational procedures of the daily educational experience.
9.5 Limitations of Study
9.5.1 Scope
Educational research can often suffer from the enormity of its application and its area
of relevance. Assessment is a major contributor to the educational arena, having far
reaching implications when considering innovative approaches. Combining this with
the significance of the e-learning paradigm where many of the substantial changes in
the approach to education are being generated there is a tendency to over commit to the
area of application, risking fragmentation of the primary research objectives.
Understandably researchers in this area need to restrict their approach in order to keep
the path clear. For this reason the scope of the research was limited to the immediate
environment in which it was developed and nurtured. This allowed the control of many
204
extraneous variables, minimising the interference and distractions that can occur when
extending the research beyond the immediate environment.
9.5.2 Internal Validity
Research that interacts with participants in a real world application is often influenced
by inherent factors from that environment. In the case of investigation into innovative
assessment strategies in the educational arena the participants often are unduly
influenced by the surrounding activities, as extra attention to assessment can artificially
increase the students level of appreciation, and as their familiarity grows and the
activities become more prevalent their reactions could be greatly overstated. This is
often referred to as the Hawthorn Effect, as demonstrated by Landsberger (1958),
where the subjects are observed to change their behaviour as a direct result of being
studied (Landsberger, 1958). In addition the introduction of new technology in support
of innovative educational activities often receives disproportional emphases to ensure
its general acceptance. The extended testing and consequential long-term exposure to
innovative assessment tools can result in the lessening of the participant alertness and
motivation towards the system.
9.5.3 External Validity, Transferability
External Validity in quantitative research is gauging the capacity of generalising from
the findings of the primary research to the extended environment. Transferability in
qualitative research refers to the ability to apply and transfer the methods and practices
formulated in the primary research to similar situations, environments and
circumstances (Lincoln & Guba, 1985). These two approaches can be considered for
this research as it has used both qualitative and quantitative research methodologies, as
discussed in Chapter 2.
The pilot programs, implementations and evaluations contained within this research
were dependent on the participation of students and instructors from various
socioeconomic groups, cultural communities and age groups. These groups bring with
them different attitudes, prejudices and preferences. All of these differences affect the
205
acquired data, as they make up part of the personalities and beliefs of the contributing
individual. These extraneous variables have had some influence on this research but
their effect has been minimal due to the relatively homogeneous group being studied.
This can be attributed to the structure of the limited environment where the study
occurred, confined to the one faculty for its duration. It would be expected that similar
research in other discipline areas could vary greatly as the attitudes and diversity of
student populations and instructors add to the recorded feedback and experiences with
the system. Such differences were demonstrated by Davies (2005) with
recommendations in rejecting severe penalties in favour of over-rewarding students for
high confidence in the correct answer for his cohort of students, or Paul (1994) who
found it difficult to apply severe negative penalties due to rejection from his fellow
instructors. Consequently, the ability to generalise from these studies becomes limited,
as acceptance and perceived value to both the instructors and the students can vary
greatly depending on their environments.
9.5.4 Construct Validity
Construct validity considers the degree to which the results can be generalised back to
the theoretical construct originally determined. The perception of fairness and
contributing value by both the student and the instructor has different meanings to
different people. This research deliberately omitted to predefine these terms, and others
like them, so as not to influence the participant’s perception. The exercises designed to
evaluate assessment with confidence, permitted the participants to freely express their
opinions without influence. Due to the ambiguity of these terms any questions were
phrased deliberately so as not to prescribe a definition. This format of non-defined
terminology encouraged personal interpretation and expression of the value of
assessment with confidence measurement. Consequently the results have no bias when
used to generalise on its area of origin, in that the findings should comfortably
represent the attitudes and perceptions of other cohorts of students from the same
discipline area.
206
9.5.5 Ecological Validity
For research to be considered ecologically valid it requires the research methods,
materials and setting to reflect the real world. In the case of this research the
requirements were met as the investigative activities all occurred in the classroom as
part of the daily activities. The results for the formative assessment exercises
contributed to the student revision and directed them along their learning path. The
application of summative assessment with confidence measurement contributed to the
student’s final grade as part of the assessment suite being offered during the semester.
The initial pilot programs compromised the ecological validity slightly as it used an
inferior prototype, but in essence provided feedback close to the final version. Similarly
the small simulation exercise removed itself from the routine delivery for a select few,
but still emulated the assessment environment in which the implementation would
occur. Importantly for the vast majority of the time the participants were mindful of the
losses and gains to be obtained and the associated risks, placing them firmly into a real
world scenario.
9.6 Research Contribution
9.6.1 Outcome 1: The MCQCM Tool
The development of the MCQCM Web-based tool is a major outcome of this research.
In its construction it has incorporated design aspects that are both functional and
beneficial to both the students and instructors. The adherence to Sim, Read and
Holifield (2008) heuristics for CAA interactive systems ensures that the student’s
experience and interaction with it is beneficial to their learning by producing
meaningful feedback. The incorporation of the balanced-scoring method that is linearly
proportional to the registered confidence places the control into the hands of the
student, where the scoring is a direct consequence of their direct manipulation of the
system. MCQCM’s ability to handle large amounts of information on restricted viewing
space by incorporating aspects of bifocal display with split screen enables the viewing
of graphics or scripts while not losing sight of the question to the student is to consider,
207
minimises the cognitive load of the student. The 24 hours/7 days a week availability of
the MCQCM via the Internet for self-assessment greatly increases its appeal to the
students as a means to assess their level of knowledge at their convenience.
9.6.1.1 Why was this research necessary?
Good assessment is required to meet a set of criteria to be considered beneficial to the
student and instructor. Present assessment strategies can fall short of these requirements
leaving both the student and instructor uninformed. The MCQCM has been designed to
address some of these shortcomings by providing a Web-based solution for formative
and summative assessment that supplies critical assessment of the student’s knowledge
while still being considered as fair. Its adherence to the games taxonomy and criteria
for good game play ensures that it provides a challenging experience in an environment
of balanced difficulty, encouraging risk-taking, rewarding for absolute and partial
knowledge while highlighting areas of incorrect understanding. Tools like the MCQCM
need to be developed and nurtured to ensure that education reaps the benefits of this
progressive technological environment.
9.6.1.2 Who benefits from this research and how do they benefit?
The benefactors from the application of the MCQCM are the students, instructors,
educational institutions and researchers of CAA that adopt assessment with confidence
into the curriculum. The student benefits from the advantages of having a method for
self-assessment readily available to them in order to ascertain their understanding and
modify their learning path appropriately. Additionally they can achieve higher results
by being rewarded for incomplete or partial knowledge in areas where they would
normally not with traditional assessment. The MCQCM permits the instructor to gauge
the level of understanding of particular components of the curriculum being assessed
via the graphical displays and graphs produced by the system, encouraging adjustment
of the delivery schedule to compensate for miscomprehension, misconception or
delusion. The educational institutions can benefit by implementing the MCQCM to
208
address the shortcomings of traditional assessment, offering a more rewarding
experience to their students.
Researchers can benefit from the MCQCM by leveraging off the development
processes and subsequent studies to develop and refine other technologies to improve
assessment within their individual contexts.
9.6.2 Outcome 2: The Value of Assessment with Confidence Measurement for
Formative Assessment
This research contains evaluation and analysis of the use of assessment with confidence
measurement for formative assessment from both the instructor’s and the student’s
perspective. It is apparent that in its particular field of application the acceptance of an
assessment with confidence measurement incorporating a penalisation and reward
scheme offered enriched feedback. This feedback influences the learning path of the
individual as a means of addressing the areas of concern, often leading to further
revision and additional reflection. The instructors also modify their teaching schedule
in order to readdress the content areas identified as in need of revisiting, especially
areas where strong misunderstanding occurs. The opportunity for the student to gauge
their present level of understanding at various times during the semester often
contributes to increased learning, greatly improving their final results. This research has
also cited similar successful applications of assessment with confidence measurement
in other fields demonstrating the value of the approach in determining the true level of
knowledge while discouraging guessing.
9.6.2.1 Why was this research necessary?
In this research the shortcomings of assessment have been discussed at length citing
various fellow researchers concerns of encouraging guessing, possible confirmation
about misunderstanding of content and inability to recognise, cultivate and nurture
partial knowledge. Additionally, the opportunity to promote the healthy declaration of
“having no knowledge” in a designated area should be of benefit to both the instructor
and the student. These practices should be encouraged, as Gardner-Medwin in a recent
209
discussion considers it to be irresponsible behaviour to merely let a student through
without penalty when demonstrating high level of confidence for incorrect knowledge.
The meta-question from Diamond and Forester (1983) of how sure the student is of
their answer to the question is about what they know, encourages deeper understanding
from the student, probing their inner thoughts to ascertain their real level of knowledge.
Educators acknowledge the limitations of assessment and many actively seek
alternatives. This is demonstrated by their willingness to offer a number of assessment
activities designed to permit the student to demonstrate their knowledge in various
ways, such as written reports, case studies, small tests, exams and other equally
acceptable assessment methods.
9.6.2.2 Who benefits from this research and how do they benefit?
The contribution of this research has benefits to both students and the instructors. In
light of the discussion above, this research demonstrates the advantages of providing
self-assessment with confidence measurement activities during the subject delivery
program. Early detection of miscomprehensions creates the opportunity to readdress
issues before it is too late. As demonstrated by Hede (2002) the active process of
rehearsal facilitates the passing of knowledge from the short-term to long-term
memory, which can be assisted by the use of formative assessment. The redirection of
learning for a student is vital to their success in their studies and deserves particular
attention. Assessment with confidence measurement offers precision in timely
identifying miscomprehensions. As previously stated the instructor benefits from being
alerted to areas of poor understanding, adjusting their instructional program
accordingly. Finally, they have the opportunity to narrow in on sub levels of the content
area in which the layers of understanding can be reinforced. This is of particular value
to vocational training where levels of competency are required and need to be
continually evaluated and addressed.
210
9.6.3 Outcome 3: The Value of Assessment with Confidence Measurement for
Summative Assessment.
Similar to formative assessment, this research identifies assessment with confidence
measurement for summative assessment as a solution to the problem identified. The
recorded appreciation of both the students and the instructors with its supporting
statistical evidence positions it as a viable assessment strategy. In some cases the
advent of new technology has pushed the assessment methods beyond their original
intended use, as in the case of MCQ’s, where the ease of facilitating their construction
and implementation places them high in the order of choice.
9.6.3.1 Why was this research necessary?
As summative assessment plays a major role in the educational process, it is important
that the instructors use assessment techniques that offer the best solutions. The
acknowledgment of the failings of traditional assessment permits the inclusion of
alternative assessment enabling a greater precision in producing a summative grade.
The practice of students guessing creates artificial results, a noise in the data (De Carlo
2005), which falsely distributes the grades and the integrity of the results. The
opportunity to minimise the effect of this negative influential data is appealing to both
educators and students, especially the better students who consider guessing by the
lower achieving students to erode their high levels of achievement. Traditional MCQ
assessment can only classify a question as correct or incorrect, black or white, with no
areas of shade to recognise various levels of student knowledge. The ability for
assessment with confidence measurement to demonstrate degrees of knowledge is a
contributor to summative assessment, producing a new tier in the scores identifying
partial knowledge.
9.6.3.2 Who benefits from this research and how do they benefit?
The benefits from this research into using assessment with confidence measurement for
summative assessment to both student and instructors is significant, as the outcomes
211
offer enriched feedback and a greater distribution of grades predicting the student’s
level of knowledge.
Summative assessment that takes place before the end of a delivery program, early to
mid-semester, has itself formative assessment attributes, as the feedback can be acted
upon by the student guiding them along their learning path, and the instructors in
designing their delivery schedule. In this situation the discussion in 9.6.2 pertaining to
the benefits of this research to formative assessment apply.
When summative assessment occurs at the end of the delivery program, as the exam or
the final test, the benefits to students and instructors have a different emphasis, as the
formative feedback has limited influence on the learning path for the student and
delayed action for delivery schedule amendments by the instructor. When assessment
with confidence measurement is included in the final summative assessment the student
still receives the enriched feedback, highlighting the areas of greatest concern being
where a comfortable level of knowledge was demonstrated and where the acquired
knowledge is high. Those students who either need to repeat the subject or continue
with further studies in the area can address their concerns and build on their strengths in
preparation for the following semester.
The instructor usually analyses the results of the final assessment to ascertain elements
of the delivery program that require a rethink and adjust most of the syllabus
accordingly. Contained within this research is the suggestion that the increased amount
of generated scores from assessment with confidence measurement permits the
instructors to offer a greater distribution of final results. This could influence the
allocation of the final grades, offering the ability to increase the distinction between the
students, contributing to a more discerning set of results.
One of the greatest benefits identified by this research to be gained from using
confidence measurement summative assessment is the byproduct of scheduling the
activity during the semester. As in most cases the operation of this novel assessment
tool is foreign to the students, requiring them to familiarise themselves with the method
of assessment in preparation for the summative assessment task. This forced exposure
to the tool for formative assessment increase their self-assessment regime, providing
feedback to them whether they want it or not. It is generally accepted that the better
212
students use any formative assessment opportunities made available, assisting in
securing their high grades. The lower achievers tend to stay away form them, partly due
to their general attitude towards learning and sometimes due to their fear of what their
score will be. The balanced scoring calculated from the confidence registered can
support these students by permitting them to concede where they have no knowledge
without imposing too high a penalty and maximizing their score where they do have
knowledge. While this research did not gauge the outcomes of this it did observe that
the students appreciated the opportunity to safely admit having little or no knowledge
in some areas without incurring harsh penalties in the pilot programs of Chapter 5 and
the implementations outlined in Chapter 8.
9.6.4 Outcome 4: Heuristics for CAA with Confidence Measurment
In order to ensure a reliable CAA interactive system it is necessary to undertake
evaluation, in this case, as part of the HCI iterative process. Usual practice for
interactive Web-based system is to evaluate against a set of developed heuristics
customised to the area of application. Sim, Read and Holifield (2006) developed their
recommended heuristics for CAA over a period of time by creating a corpus of
usability problems, recorded by CAA activities and testing. The heuristics developed
by Sims et al. have been used to synthesise the set of CAA specific heuristics used for
this research. During this process this research recognised the need to further extend
these heuristics to apply to CAA applications that incorporate confidence measurement,
as the testing and observations highlighted areas of concern that are unique to MCQ
with confidence measurement assessment with the use of technology. These extended
heuristics have been itemised in Chapter 6 and are available for future reference and
application as required.
9.6.4.1 Why was this research necessary?
Interactive assessment tools with confidence measurement are required to meet an
additional set of criteria to be considered of good quality and beneficial to all users. It is
the adherence to these extended criteria that present the greatest challenges to their
213
developers. In particular the adoption of a scoring method that links in proportionally to
the direct manipulation of the interface is critical to the perception of being fair and in
the control of the learner. This research recommends the balanced scoring technique
where the possible gain is equal to the loss when waging on answers to be correct or
incorrect, and the method by which the student registers their confidence is
proportionally scaled on the interface, as the moving of the sliding bar for confidence is
in direct proportion to the resulting score. The ability to navigate freely from question
to question in a non-linear way is a requirement of all CAA applications but holds
special significance when confidence is applied, as the building and lowering of
confidence as the test progresses often leads to the need to readdress questions in a
different light before submitting. It is these unique experiences that require
consideration to a set of customised CAA with confidence measurement heuristics.
9.6.4.2 Who benefits from this research and how do they benefit?
Primarily it is the developers of CAA with confidence measurement tools that benefit
from these extended customised heuristics. Heuristic evaluation is a powerful tool with
the ability to identify most of the inadequacies of an interactive system. The success of
the activity is dependent on the quality of the heuristics and the level of expertise of the
evaluators. The extended CAA heuristics contained in this research offer the
opportunity for the developers to effectively evaluate their interactive system. In
addition to the developers, the student and instructors also benefit from these extended
heuristics as the outcome of evaluations with customised heuristics often leads to
higher levels of usability and functional refinement, improving the user experience with
the system.
9.6.5 Outcome 5: The Contribution of this Research to Educators Investigating
Alternative Assessment Strategies.
In Section 9.5.3 reference is made to the external validity of this research, where the
ability to generalise and transfer the findings of this research into similar situations is
discussed. This section shall now elaborate on the benefits this research affords to
educators who are presently, or intend to pursue alternative assessment options. It
214
firstly addresses the transferability of the findings to innovative assessment in general,
where conclusions of this research pertaining to the design and functionality of an
interactive assessment system could contribute and assist with development of a newly
proposed assessment tool. It then informs researchers considering the implementation
of existing assessment with confidence measurement, such as Paul’s (1994) CBAA or
MCQCM, the benefits to be gained by its adoption and the challenges to overcome.
9.6.5.1 Why was this research necessary?
Innovative assessment strategies require nurturing and supportive environments for
them to have a chance of being successful. Instructors who instigate them as part of the
assessment regime often find themselves in unchartered waters, as their educational
training often does not introduce them to the more novel approaches of assessment,
preferring to expose them to traditional practices. When faced with the challenge of
incorporating groundbreaking assessment into the curriculum, instructors often rely on
the experiences of others, as documented in this research, to assist them in establishing
an effective strategy to approach the challenge.
9.6.5.2 Who benefits from this research and how do they benefit?
In the general case of the introduction of non-specific innovative assessment strategies,
(not necessarily using confidence measurement but not excluding it), there are many
components of this research that can be extrapolated to different areas of application. In
particular, the recognised need for preparatory exposure to the students in non-
threatening environments in order for them to familiarise themselves with the systems,
as recommended by Sim, Read and Holifield (2008) as part of their computer aided
assessment heuristics and Desurvire, Caplan and Toth’s (2004) game play heuristics,
would greatly enhance the chance of success when introducing new Web-based
assessment tools. Furthermore, application of the HCI guidelines for sound usability of
interactive systems, as outlined in Chapter 6, is highly recommended for similar
situations, addressing the areas of good navigation, error prevention, visualisation,
system visibility and others. The appropriate choice of a metaphor offering high
affordance is critical to the success of interactive Web-based assessment, in the case of
this research the strong association to the gaming paradigm, betting/gambling, where
215
the gain/loss is proportional to the risk taken. Accordingly, consideration to game play
theory is required, in particular the need for perceived game fairness, all of which is
highly externally valid and transferable to other areas of application.
The discussion and comparison of the various scoring options available is transferable
to applications in the field, as the gathering of the findings and recommendations from
the pioneers can greatly assist newcomers who would like to investigate like practices
in their preferred areas of application. The general acceptance of assessment with
confidence measurement will permit the instructor to adopt various assessment scoring
regimes, where customization of the most appropriate scoring method can be adopted
for the particular cohort of students.
Educators in future debating of scoring techniques will be able to use the comparisons
of the works of Gardner-Medwin and Gahan (2003), Gardner-Medwin (2006), Paul
(1994), Davies (2005) and the MCQCM scoring as a basis for discussion.
9.7 Future Work
This research has demonstrated the benefits of employing assessment with confidence
measurement, presenting the work of others in their attempt to eliminate guessing and
promote the declaration of partial or no knowledge through the development of scoring
strategies offering rewards for correct answers and penalty scores as deterrents. All of
the scoring options discussed are acceptable scoring techniques that could be used. This
research has recognised the advantages of each of the previously developed scoring
techniques including the balanced scoring adopted by the MCQCM, acknowledging
that the decisional process in choosing an appropriate scoring for implementation is
dependent on the background of the students where it will be implemented.
In light of the previous discussions it is envisaged that the design of future applications
of this research’s assessment with confidence tool (MCQCM) should incorporate a
scoring option selection mechanism, providing a number of scoring regimes available
to the user. This would permit the instructor to determine the most appropriate scoring
method for the cohort of students. In this approach the choice of scoring can be set for
the duration of the delivery or change mid-semester depending on the situation. It could
216
be feasible that the scoring penalty increases as the students progress through the
curriculum in an attempt to raise them to a higher level of knowledge. This approach
permits the application of a more “forgiving” regime, such as Paul’s (1994) CBAA if
required. As an example there might be a need to increase the students’ confidence in
certain difficult content areas by offering more lenient scoring for incorrect answers.
Alternatively there might be the need to apply severe penalties to students who require
a more honest appraisal of their progress, which Paul (1994) also provides for that
purpose but rarely implements.
Future studies may include recognition and determination of appropriate scoring
systems to match specific circumstances.
The scope of this research confined it to the Information Technology area. It would be
beneficial for further studies to investigate its application in different disciplines, as the
perception of assessment strategies by both the students and the instructors can vary
greatly depending on their social and cultural values that often differ depending on the
area of study.
In addition future research in this area would benefit from application with cohorts of
students without gender bias or more specifically that are not dominated by males as
was the case in this study. The IT educational environment attracts more males than
females, which makes gender comparative analysis difficult to perform. This research
could draw no conclusions in the area of gender bias but envisages that future activities
would be designed in areas of application where the student cohorts were comprised of
groups with gender balance.
Finally this research would benefit from exposure to cohorts of students from different
cultures, preferably in their country of origin. This would generate rich data for analysis
to ascertain if the acceptance and successful operation of assessment with confidence
measurement is dependent on the cultural environment. It is proposed that the MCQCM
be transferred to an international venue for further trials for this purpose.
217
9.8 Concluding Remarks
The outcomes of this research have contributed to addressing the concerns of traditional
assessment strategies that fail to honestly appraise a student’s level of knowledge and
encourage guessing. In order for an assessment to be of value it is required to conform
to a set of good assessment criteria, which is often not the case. This can be partially
attributed to the advent of new technology pushing the use of MCQs beyond their
intended use as an evaluation of a broad area of knowledge usually at Bloom’s lower
levels of knowledge. The ease by which they can be constructed and implemented
increases their appeal. Accordingly educators have embraced their application with
great enthusiasm often extending their implementation to higher Bloom’s order
assessment activities. This requires extensive knowledge in the construction of the
questions, which is a challenge to the most experienced educators.
The imposing of penalties for incorrect answers is another controversial subject within
CAA that will always bring forth heated debate, as educators hold varying opinions on
its ethical stand. This research acknowledges the challenges faced in determining the
most appropriate scoring technique to best suit the needs of the students. Consideration
must be given to the general intellectual position of the students, the purpose of the
assessment and the importance of the assessment task when deciding which scoring
technique to use.
This research encourages instructors interested in adopting assessment with confidence
to consider two closely related issues, with the second completely dependent on the
acceptance of the first. Initially, the instructor must decide if the benefit of assessment
with confidence measurement is worth pursuing within the individual context. This is
the pivotal question, as answering it in the affirmative commits them to the task of
successfully implementing it as either formative, in preparation for the oncoming
summative assessments, or both formative and summative, as using it for summative
assessment will necessitate formative assessment activities to ensure its success.
Inclusive to this first question is whether the instructor is prepared to impose penalties
in their scoring, as this is the mechanism by which the confidence can be incorporated
into the assessment. Instructors may find this the most challenging, as they will be
218
required to defend their choice to both students and in some cases other educators.
Once committed to assessment with confidence measurement the instructor will need to
consider the second question, that is, which scoring technique would best suit his/her
situation? This decision requires considering the intellectual ability of their students,
the motivation behind their studies, the discipline area in which the population lie and
any other influential factors. Application of assessment with confidence to a more
specific cohort of students with recognised common characteristics, traits and
behavioural qualities may suggest the adoption of scoring routines suited to their needs,
such as in the medical field (Gardner-Medwin, 2006), the Computer Science (Davies,
2004) discipline and in the Engineering domain (Paul, 1994). These deviations from the
balanced scoring of MCQCM are encouraged, as the situation requires. It is the
findings of this research that the choice of scoring is secondary to the adoption of
assessment with confidence.
Assessment based on the gambling metaphor leverages greatly off the games
environment, which is already a significant contributor to the entertainment of many of
our students. If developed and used correctly it could offer the chance to educate our
students while entertaining them and support the transition of knowledge from the short
term to the long-term memory. The synergies between the MCQ with confidence
measurement and games offer opportunities to leverage off games theory to improve
student involvement and learning and should be considered when developing within
this area.
Like many assessment choices MCQs offer the ability to assess broad areas of
knowledge effectively with minimal burden to the instructor, a benefit that should not
be overlooked. It is the scoring of the MCQ that comes under scrutiny in this research,
as the reticence of educators to adopt scoring methods to discourage guessing
jeopardises the value of MCQ testing to the educational process.
The value of assessment with confidence measurement when used for formative
application to the educational process has been widely discussed in this research and
offer enriched feedback and honest appraisal of the state of knowledge. To promote
direction along a fruitful learning path students and instructors should aim to take
advantage of technologies that adjust the learning path to address the highlighted gaps
219
in their knowledge, modifying the study program by the student and the delivery
program by the instructor.
In closing, this research offers encouragement and support to those who intend to
pursue assessment with confidence measurement by demonstrating the benefits to be
gained by both the students and instructors.
To improve education it is encouraged that educators should look beyond traditional
assessment practices, in particular when using technology to deal with an ever-
changing educational culture and student cohort. There is an increasing need to develop
best practices in assessment to exclude the minimization of feedback and loss of
student control in their ability to show their knowledge by adopting technologies that
are ill suited to the pedagogy of formative assessment.
Research into assessment with technology is still in its infancy and requires substantial
scrutiny to ensure that it is not dismissed as being unable to provide the necessary
assessment or even worse used without consideration of the implications of sub
standard procedures. This research offers one example of enabling technology to
improve student assessment while taking advantage of the benefits technology can
offer.
220
REFERENCES
Abdulwahed, M., Nagy, Z., & Blanchard, R. (2008). Beyond The Engineering
Pedagogy, Modelling Kolb’s Learning Cycle Australian Association for Environmental
Education Conference Proceedings Yeppoon.
Acker, D., & Duck, N. W. (2008). Cross-cultural Overconfidence and Biased Self-
attribution. Journal of Socio-Economics, 37(5), 1815-1824.
Ackerly, B. (2004). Critical Theory and Method in Democratic and Human Rights
Theories Annual Meeting of the International Studies Association (pp. 133). Montreal,
Canada.
Adams, E., & Rollings, A. (2007). Games Design and Development, Fundamentals of
Game Design, . Australia: Pearson Prentice Hall
Alnabhan, M. (2002). An Empirical Investigation of The Effects of Three Methods of
Handling Guessing and Risk Taking on The Psychometric Indices of a Test, Social
Behaviour and Personality. Scientific Journal 30(7), 645-652.
American Psychological Association Work Group of the Board of Educational Affairs.
(1997). Learner-Centered Psychological Principles, A Framework for School Reform
and Redesign. Washington, DC: American Psychological Association.
Amory, A. (2007). Game Object Model Version II: A Theoretical Framework for
Educational Game Development. Educational Technology Research and Development,
55(1), 51-77.
Amory, A., & Seagman, S. (2003). Education Game Models: Conceptualization and
Evaluation. South African Journal of Higher Education, 17(2), 206-217.
Ashburn, R. (1938). An Experiment in Essay-type Question. Journal of Experimental
Education, 7(1), 1-3.
Astin, A. (1991). Assessment for Excellence. Connecticut, USA Greenwood Publishing.
Ayala, C., Shavelson, R., Ruiz-Primo, M., Brandon, P., Yin, Y., Furtak, E., et al.
(2008). From Formal Embedded Assessments to Reflective Lessons: The Development
of Formative Assessment Studies,. Applied Measurement in Education, 21(4), 315-334.
221
Bacon, D. R. (2003). Assessing Learning Outcomes: A Comparison of Multiple-Choice
and Short-Answer Questions in a Marketing Context. Journal of Marketing Education,
25(1), 31-36.
Baird, D., & Fisher, M. (2006). Neomillennial User Experience Design Strategies:
Utilizing Social Networking Media to Support "Always On" Learning Styles Journal of
Educational Technology Systems 34(1), 5-32.
Bandura, A. (1983). A Self-Evaluation and Self-Efficacy Mechanisms Governing the
Motivational Effects of Goal Systems. Journal of Personality and Social Psychology
45(5), 1017-1028.
Bannon, L., Cypher, A., Greenspan, S., & Monty, M., L. (1983). Evaluation and
Analysis of Users' Activity Organization, Proceedings of the SIGCHI Conference on
Human Factors in Computing Systems. Boston, Massachusetts, United States: ACM.
Banta, T., Jones, E., & Black, K. (2009). Planning Effective Assessment. In Designing
Effective Assessment: Principles and Profiles of Good Practice in Designing Effective
Assessment. (pp. 3-10). San Francisco, CA: Wiley.
Ben-Simon, A., Budescu, D. V., & Nevo, B. (1997). A Comparative Study of Measures
of Partial Knowledge in Multiple-Choice Tests. Applied Psychological Measurement,
21(1), 65-88.
Betz, N., & Hackett, G. (1981). Manual for the Ocupational Self-Efficacy Scale.
Journal of Counseling Psychology, American Psychological Association, 28(5), 399-
410.
Bevan, N. (2009). International Standards for Usability Should Be More Widely Used.
Journal of Usability Studies, 4(3), 106-113.
Black, P., & Wiliam, D. (1998). Inside the Black Box: Raising Standards Through
Classroom Assessment. Phi Delta Kappan, 80(2), 139-148.
Black, P., & Wiliam, D. (2006). Developing a Theory of Formative Assessment. In
Assessment and Learning (pp. 81–100). London, UK: Sage.
Black, P., & Wiliam, D. (2009). Developing the Theory of Formative Assessment.
Educational Assessment, Evaluation and Accountability, 21(1), 5-31.
222
Bloom, B., & Krathwohl, D. (1956). Taxonomy of Educational Objectives: The
Classification of Educational Goals, by a committee of college and university
examiners. . New York: Longman, Green.
Bradbard, D., Parker, D., & Stone, G. (2004). An Alternative Multiple-Choice Scoring
Procedure in a Microeconomics Course Decision Sciences Journal of Innovative
Education 2(1), 11-26.
Bradshaw, H. (2007). Computer Game Playability; Learning Through Game Play
Design. Paper presented at the Learning with Games (LG) Conference Proceedings,
Sophia Antipolis, France.
Brown, T., & Shufford, E. (1973 ). Quantifying Uncertainty Into Numerical
Probabilities for the Reporting of Intelligence. Sant-Monica RAND
Bush, M. (2001). A Multiple Choice Test that Rewards Partial Knowledge. Journal of
Further and Higher Education, 25(2), 157 - 163.
Carless, D. (2007). Learning-oriented Assessment: Conceptual Bases and Practical
Implications. Innovations in Education and Teaching International, 44, 57-66.
Cassell, C., & Nadin, S. (2008). Theory and Research Methods: Interpretivists
Approaches to Entrepreneurship. In R. Barrett, S. Mayson & E. Elga (Eds.),
International Handbook of Entrepreneurship and HRM (pp. 71-88).
Chatti, M., Jarke, M., & Frosch- Wilke, D. (2007). The Future of E-learning: A Shift
To Knowledge Networking and Social Software. , International Journal of Knowledge
and Learning 3(4-5 ), 404-420.
Choi, I., Lee, S., & Jung, J. (2008). Designing Multimedia Case-Based Instruction
Accommodating Students’ Diverse Learning Styles. Journal of Educational
Multimedia and Hypermedia, 17(1), 5-25.
Clark, J., & Friesen, L. (2009). Overconfidence in Forecasts of Own Performance: An
Experimental Study. Economic Journal, 119(534), 229-251.
Clark, R., & Feldon, D. (2005). Five Common but Questionable Principles of
Multimedia Learning In R. Mayer (Ed.), The Cambridge Handbook of Multimedia
Learning (pp. 97–115). New York, USA Cambridge University Press
223
Coffield, F., Moseley, D., Hall, E., & Ecclestone, K. (2004). Learning Styles and
Pedagogy In Post-16 Learning: A Systematic and Critical Review. , from
http://www.ncl.ac.uk/ecls/research/project/1927
Cohen, L., Manion, L., & Morrison, K. (2007). Chapter 1:The Nature of Enquiry-
Setting the Field. In Research Methods in Education (pp. 5-47). London, UK:
Routledge.
Comte, A. (1868). Positive Philosophy New York William Gowens.
Corveleyn, J., & Luyten, P. (2006). Minding the Gap Between Positivism and
Hermeneutics in Psychoanalytic Research. American Psychoanalytic Association,
54(2), 571-610.
Crocker, L. (2005). Teaching for the Test: How and Why Test Preparation is
Appropriate. In R. Phelps (Ed.), Defending Standardized Testing. (pp. 159-174). New
Jersey, USA: Lawrence Erlbaun Associates.
Daniel, B., O’Brien, D., & Sarkar, A. (2009). User Centered Design Principles for
Online Learning Communities: A Sociotechnical Approach for the Design of a
Distributed Community of Practise In M. Lytras & P. Ordonez de Pablos (Eds.), Social
Web Evolution: Integrating Semantic Applications and Web 2.0 Technologies (pp. 54-
71). Hershey, USA: Information Science Publishing.
Davidoff, F. (1995). Confidence Testing - How to Answer a Meta-Question. American
College of Physicians Observer.
Davies, P. (2005). Continual Assessment of Confidence or Knowledge with Hidden
MCQ. Paper presented at the Computer Assisted Assessment Loughborough, England.
De Carlo, L., A Model of Rater Behaviour in Essay Grading Based on Signal Detection
Theory, Journal of Education Measurement, Vol. 42, Iss. 1, Wiley-Blackwell, p 53-76.
(2005). A Model of Rater Behaviour in Essay Grading Based on Signal Detection
Theory. Journal of Education Measurement,, 42(1), 53-76.
Desurvire, H., Caplan, M., & Toth, J. (2004). Using Heuristics to Evaluate the
Playability of Games. . Paper presented at the CHI Association for Computing
Machinery, New Jersey, USA.
Diamond, G., & Forrester, J. (1983). An Epistemologic Model of Clinical Judgment.
Amican Journal of Medicine, 75, 129-137.
224
Dix, A., Finlay, J., Abowd, G., & Beale, R. (2004). Human-Computer Interaction (4th
ed.). Australia: Pearson.
Doebbert, J. (1999). Benchmarking the Learning Environment (Technology). In
National Centre of Research in Vocational Education. California, USA: Copa &
Ammentorp Press.
Farrell, G., Farrell, V., & Leung, Y. (2001). Online Software Test for Efficient and
Effective Assessment Using Multiple Choice Questions- An Evaluation. Paper presented
at the American Educational Research Association Conference Seattle, USA.
Farrell, G., & Leung, Y. (2002a). Designing an Online Self-Assessment Tool Utilizing
Confidence Measurement. Paper presented at the Seeking Success in E-Business, IFIP
8.4 Working Group, Copenhagen, Denmark.
Farrell, G., & Leung, Y. (2002b). Improving the Design of an Online Self-Assessment
Tool Utilizing Confidence Measurement. Paper presented at the Web-Based Learning:
Men and Machines, Hong Kong.
Farrell, G., & Leung, Y. (2004a). Comparison of Two Student Cohorts Utilizing Black
Board CAA with Different Assessment Content: A Lesson to be Learnt. Paper presented
at the Computer Assisted Assessment Conference Loughborough, England.
Farrell, G., & Leung, Y. (2004b). Innovative Online Assessment. Education and
Information Technology Journal of the IFIP Technical Committee on Education, 9(1),
5-20.
Farrell, G., & Leung, Y. (2005). A Comparison of Blackboard CAA and an Innovative
Self Assessment Tool for Formative Assessment. Paper presented at the Computer
Assisted Assessment Conference, Loughborough, England.
Farrell, G., & Leung, Y. (2006). A Comparison of an Innovative Assessment Tool
Utilizing Confidence Measurement to the Traditional Multiple Choice, Short Answer
and Problem Solving Questions. Paper presented at the Computer Assisted Assessment
Conference Loughborough, England.
Farrell, G., & Leung, Y. (2008). Convergence of Validity for the Results of a
Summative Assessment with Confidence Measurement and Traditional Assessment.
Paper presented at the Computer Assisted Assessment Conference Loughborough,
England.
225
Feltz, D. (2007). Self Confidence and Sports Performance . In D. Smith & M. Bar-Eli
(Eds.), Essential Readings in Sport and Exercise Psychology, Human Kinetics (pp. 423-
458). Illinois, USA, .
Frandsen, G., & Schwartzbach, M. (2006). A Singular Choice for Multiple Choice.
Special Interest Group on Computer Science Education (SIGICS) Bulletin Association
for Computer Machinery, 39(4), 34-38.
Frary, R. (1985). More Multiple-choice Item Writing Do's and Don'ts, Practical
Assessment, Assessment and Evaluation. ERIC Clearinghouse on Assessment and
Evaluation., 4(11).
Fuchs, C., & Sandoval, M. (2008). Positivism, Postmodernism, or Critical Theory? A
Case Study of Communications Students’ Understandings of Criticism. Journal For
Critical Education Policy Studies, The Institute for Educational Study Policies (IEPS),
6(2), 112-141.
Fullarton, S. (1993). Confidence in Mathematics: The Effects of Gender, Research
Mongraph. Melbourne, Australia: Deakin University: National Centre for Research and
Development in Mathematics Education.
Gardner-Medwin, A. (2006). Confidence-Based Marking: Towards Deeper Learning
and Better Exams In C. Bryan & K. Clegg. (Eds.), Innovative Assessment in Higher
Education (pp. 141-149). London, England: Taylor & Francis.
Gardner-Medwin, A., & Gahan, M. (2003). Formative and Summative Confidence-
Based Assessment. Paper presented at the Computer Assisted Assessment Conference,
Loughborough, England.
Gerjetsa, P., Scheiter, K., Opfermann, M., Hesseaand, F., & Eysinkc, T. (2009).
Learning With Hypermedia: The Influence of Representational Formats and Different
Levels of Learner Control on Performance and Learning Behavior. Computers in
Human Behavior, Elsevier, 25(2), 360-370.
Gerring, J. (2003a). Interpretations of Interpretivism, Qualitative Methods. Newsletter
of the American Political Science, Association Organized Section on Qualitative
Methods: Non Refereed, 1(2), 2-6.
Gerring, J. (2003b). Qualitative Methods. Newsletter of the American political Science
association Organized Section on Qualitative Methods: Non Refereed.
226
Gerring, J. (2007). Conundrum of Case study. In J. Gerring (Ed.), Case Study Research
(pp. 1-51). New York, USA: Cambridge University Press.
Giddings, L., & Grant, B. (2007). A Trojan Horse for Positivism? A Critique of Mixed
Methods Research. Advances in Nursing Science, 30(1), 52-60.
Govaerts, M., C., V. der V., Schuwirth, L., & Muijtjens, M. (2007). Broadening
Perspectives on Clinical Performance Assessment: Rethinking the Nature of In-training
Assessment. Advances in Health Sciences Education, 12(2), 239-260.
Greenfield, P. (2009). Technology and Informal Education: What Is Taught, What Is
Learned. American Association for the Advancement of Science (AAAS), 323(5910), 69-
71.
Harris, L., Sadowski, M., & Birchman, J. (2006). A Comparison of Learning Style
Models and Assessment Instruments for University Graphics Educators. The
Engineering Design Graphics Journal, 70(1), 6-15.
Harrison, C., & Petrie, H. (2007). Deconstructing Web Experience: More Than Just
Usability and Good Design. In Lecture Notes in Computer Science: Human-Computer
Interaction HCI Applications and Services. (Vol. 4553/2007, pp. 889-898). Berlin,
Heidelberg, Germany: Springer Berlin / Heidelberg.
Hartley, K., Strudler, N., & Schraw, G. (2008). Nevada Schools Educational
Technology Needs Assessment. Nevada, USA: College of Education
University of Nevada, Las Vegas.
Hartmann, B., Abdulla, L., Mittal, M., & Klemmer, S. (2007). Authoring Sensor-based
Interactions by Demonstration with Direct Manipulation and Pattern Recognition.
Paper presented at the Human Factors in Computing Systems (SIGCHI).
Hassman, P., & Hunt, D. (1994). Human Self-Assessment in Multiple Choice Testing.
Journal of Educational Measurement, 31(2), 149-160.
Hattie, J., & Timperley, H. (2007). The Power of Feedback. Review of Educational
Research. American Educational Research Association, 77(1), 81-112.
Hede, A. (2002). An Integrated Model of Multimedia Effects on Learning. Journal of
Education, Multimedia and Hypermedia, 11(12), 177-191.
Hobson, A., & Ghoshal, D. (1996). Flexible Scoring for Multiple-Choice Exams,. The
Physics Teacher, 34(5), 284- 305.
227
Hogg, M., & Maclaran, P. (2008). Rhetorical Issues in Writing Interpretivist Consumer
Research,. Qualitative Market Research: An International Journal, 11(2), 130 – 146.
Hounsell, D., McCune, V., Hounsell, J., & Litjens, J. (2008). The Quality of Guidance
and Feedback to Students. Higher Education Research and Development, 27(1), 56-67.
Howlett, D., Vincent, T., Watson, G., Owens, E., Webb, R., Gainsborough, N., et al.
(2009). Blending Online Techniques with Traditional Face to Face Teaching Methods
to Deliver Final Year Undergraduate Radiology Learning Content,. European Journal
of Radiology, In Press.
Hussain, Z., Lechner, M., Milchrahm, H., Shahzad, S., Slany, W., Umgeher, M., et al.
(2008). Agile User-Centered Design Applied to a Mobile Multimedia Streaming
Application. In Lecture notes in Computer Science, HCI and Usability for Education
(Vol. 5298, pp. 313-330.). Springer Berlin / Heidelberg: Springer.
Hyerle, D. (2009). Visual Tools for Transforming Information Into Knowledge.
London, England: Sage.
Isacker, K., Slegers, K., Gemou, M., & Bekiaris, E. (2009). A UCD Approach Towards
the Design, Development and Assessment of Accessible Applications in a Large Scale
European Integrated Project. Paper presented at the Universal Access to Human-
Computer Interaction, San Diego, USA.
Jennings, S., & Bush, M. (2006). A Comparison of Conventional and Liberal (Free
Choice) Tests, in Practical Assessment. Research and Evaluation Online Journal,,
11(8), 1-5.
Johnson, P., Buehring, A., Cassell, C., & Symon, G. (2006). Evaluating Qualitative
Management Research: Towards a Contingent Criteriology. International Journal of
Management Reviews, 8(3), 131-156.
Kalyuga, S., Chandler, P., & Sweller’s, J. (1998). Levels of Expertise and Instructional
Design. Human Factors, 40(1), 1-17.
Karpicke, J., Butler, A., & Roediger III, H. (2009). Metacognitive Strategies in Student
Learning: Do Students Practise Retrieval When They Study on Their Own? Memory
Journal, 17(4), 471-479.
Kaufman, D. (2003). Applying Educational Theory In Practice. Quality & Safety in
Health Care, British Medical Journal (BMJ), 326(7382), 213-216.
228
Kehoe, J. (1995). Writing Multiple Choice Test Questions in Practical Assessment and
Evaluation,. ERIC Clearinghouse on Assessment and Evaluation, 4(9).
Keller, J. (2008). First Principles of Motivation to Learn and E3-learning. Distance
Education, 29(2), 175-185.
Kennedy, K., Chan, J., Fok, P., & Yu, W. (2008). Forms of Assessment and Their
Potential For Enhancing Learning: Conceptual and Cultural Issues. Educational
Research For Policy and Practice, 7(3), 197-207.
Klinger, A. (1997). Experimental Validation of Learning Accomplishment. Paper
presented at the Frontiers in Education Los Angeles, USA.
Kolb, D. (1984). Experiential Learning: Experience As The Source of Learning and
Development. Jersey, USA: Prentice Hall.
Kolb, D. (1999). The Kolb Learning Style Inventory, Version 3. Boston,USA: Hay
Group.
Komarraju, M., Karau, S., & Schmeck, R. (2009). Role of the Big Five Personality
Traits in Predicting College Students' Academic Motivation and Achievement.
Learning and Individual Differences, 19(1), 47-52.
Krätzig, G., & Arbuthnott, K. (2009). Metacognitive Learning: The Effect of Item-
Specific Experience and Age on Metamemory Calibration and Planning. Metacognition
and Learning,, 4(2), 125-144.
Krumboltz, J., & Christine, J. (1999). Point of View, Competitive Grading Sabotages
Good Teaching: Professional Education in Education, Phi Delta Kappan.
Landsberger, H. (1958). Hawthorne Revisited. New York, USA: Cornell University
Press.
Lederman, N., & Niess, M. (2000). Technology's Sake or for the Improvement of
Teaching and Learning? . School Science and Mathematics Journal, 100(7), 345-348.
Leung, Y. (1995). Applying Bifocal Displays to Data Visualisation. Swinburne
University of Technology, Melbourne, Australia.
Libarkin, J. (2008). Concept Inventories in Higher Education Science. Paper presented
at the Board Of Science Education Conference, Washington, DC.
Lincoln, Y. S., & Guba, E. G. (1985). Naturalistic Inquiry: Sage Publications, Inc.
229
Lindström, H., & Malmsten, M. (2008). User-centred Design and Agile Development:
Rebuilding the Swedish National Union Catalogue. The Code4Lib Journal, 5.
Longino, H. E. (2002). The Fate of Knowledge: Princeton Univ Pr.
Mansell, R. (2009). A Critique of the Mainstream Vision and an AlternativeResearch
Framework. The Information Society and ICT Policy, Journal of Information,
Communication & Ethics in Society, 8(1), 22-44.
Marcus, A. (1984). Graphic Design for Computer Graphics. Computers in Industry,
5(1), 51-63.
Marshall, J. B., & Carson, C. M. (2008). A Preliminary Bloom’s Taxonomy
Assessment Of End-Of-Chapter Problems In Business School Textbooks. American
Journal of Business Education--Fourth Quarter, 1(2).
Marshall University. (1999). Comparison of Online Course Delivery Software
Products, from http://multimedia.marshall.edu/cit/webct/compare/comparison.html
Martin, D. J. (2008). Elementary Science Methods: A Constructivist Approach (5th
ed.): Wadsworth Pub Co.
Marx, K. (1884). Economic and Philosophical Manuscripts of 1844. In Early Writings
(pp. 279-400). Berlin: Dietz.
Mayhew, D. J. (1999). The Usability Engineering Lifecycle: A Practitioner's Handbook
for User Interface Design: Morgan Kaufmann.
McCormick, B. H. (1988). Visualization in Scientific Computing. ACM SIGBIO
Newsletter, 10(1), 21.
McCoubrie, P. (2004). Improving the Fairness of Multiple-choice Questions: A
Literature Review. Medical Teacher, 26(8), 709-712.
McIlveen, P. (2007). The Genuine Scientist-practitioner in Vocational Psychology: An
Autoethnography. Qualitative Research in Psychology, 4(4), 295-311.
McNeil, B. J., & Nelson, K. R. (1990). Meta-Analysis of Interactive Video Instruction:
A 10 Year Review of Achievement Effects.
Moos, D. C., & Azevedo, R. (2009). Learning with Computer-based Learning
Environments: A Literature Review of Computer Self-efficacy. Review of Educational
Research, 79(2), 576.
230
Morris, M., Porter, A., & Griffiths, D. (2004). Assessment is Bloomin' Luverly:
Developing Assessment that Enhances Learning. Journal of University Teaching and
Learning Practice, 1(2), 90-106.
Najjar, L. J. (1996). Multimedia Information and Learning. Journal of Educational
Multimedia and Hypermedia, 5(2), 129-150.
National Council of Teachers of Mathematics. (2000). Principles and Standards for
School Mathematics Retrieved 11/5/2010, from http://standards.nctm,org
Ng, A. W. Y., & Chan, A. H. S. (2009). Different Methods of Multiple-Choice Test:
Implications and Design for Further Research. Proceedings of the International
MultiConference of Engineers and Computer Scientists, 2.
Nicol, D. J., & Macfarlane-Dick, D. (2006). Formative Assessment and Self-regulated
Learning: A Model and Seven Principles of Good Feedback Practice. Studies in Higher
Education, 31(2), 199-218.
Nielsen, J. (1994a). Enhancing the Explanatory Power of Usability Heuristics. Paper
presented at the Proceedings of the SIGCHI Conference on Human Factors in
Computing Systems: Celebrating Interdependence.
Nielsen, J. (1994b). Usability Inspection Methods. Paper presented at the Conference
Companion on Human Factors in Computing Systems.
Nielsen, J., & Molich, R. (1990). Heuristic Evaluation of User Interfaces. Paper
presented at the Proceedings of the SIGCHI Conference on Human Factors in
Computing Systems: Empowering People.
Nieweg, M. R. (2000). Learning to Reflect a Practical Theory of Teaching, Amsterdam
University of Professional Education, Netherlands.
Novak, J. D., & Cañas, A. J. (2008). The theory Underlying Concept Maps and How to
Construct and Use Them. Florida Institute for Human and Machine Cognition
Pensacola Fl, .
Paddison, C., & Englefield, P. (2004). Applying Heuristics to Accessibility Inspections.
Interacting with Computers, 16(3), 507-521.
Palmer, E. J., & Devitt, P. G. (2007). Assessment of Higher Order Cognitive Skills in
Undergraduate Education: Modified Essay or Multiple Choice Questions? Research
Paper. BMC Medical Education, 7(1), 49.
231
Paul, J. (1994). Improving Education Through Computer-based Alternative Assessment
Methods. People and Computers, 81-81.
Perrenoud, P. (1998). From Formative Evaluation to a Controlled Regulation of
Learning Processes. Towards a Wider Conceptual Field. Assessment in Education:
Principles, Policy & Practice, 5(1), 85-102.
Piaget, J., & Duckworth, E. (1970). Genetic Epistemology. American Behavioral
Scientist, 13(3), 459.
Pollard, G. (1985). Scoring in Multiple-choice Examinations. Math. Scientist, 10, 93-
97.
Pollard, G. (1986). Scoring to Remove Guessing in Multiple Choice Exams. Math.
Education Science Technology, 20(2), 33-36.
Pollard, G. (1993). Further Scoring Systems to Remove Guessing in Multiple Choice
Examinations. Mathematics Competitions, 2(1), 27-43.
Pollard, G., & Clark, D. (1989). An Optimal Scoring System of Multiple Choice
Competitions and an Analysis of Candidates Responses Under Two Different Methods
of Scoring. Mathematics Competitions, 2(2), 33-36.
Popham, W. J. (2008). Transformative Assessment: ASCD.
Prensky, M. (2003). Digital Game-based Learning. Computers in Entertainment (CIE),
1(1), 21.
Quinn, C. N. (2005). Engaging Learning: Designing E-learning Simulation Games:
Pfeiffer & Co.
Reed, I. (2008). Review Essay: Social Theory, Post-Post-Positivism and the Question
of Interpretation. International Sociology, 23(5), 665.
Rice, M., Campbell, C., & Mousley, J. (2007). Using Online Environments to Promote
Assessment as a Learning Enhancement Process. Enhancing Teaching and Learning
Through Assessment: Deriving an Appropriate Model, 418.
Rieber, L. P. (1996). Seriously Considering Play: Designing Interactive Learning
Environments Based on the Blending of Microworlds, Simulations, and Games.
Educational Technology Research and Development, 44(2), 43-58.
Righi, C., & James, J. (2007). User-centered Design Stories: Real-world UCD Case
Studies. Interactive Technologies, 560.
232
Rodriguez, M. C. (2005). Three Options are Optimal for Multiple-choice Items: A
Meta-analysis of 80 Years of Research. Educational Measurement: Issues and
Practice, 24(2), 3-13.
Salen, K., & Zimmerman, E. (2003). Rules of Play: Game Design Fundamentals: MIT
Press.
Schneiderman, B. (1997). Designing the User Interface (3rd ed.). Boston: Addison-
Wesley.
Schuwirth, L. W. T., & Van Der Vleuten, C. P. M. (2006). Challenges for
educationalists. British Medical Journal, 333(7567), 544.
Seffah, A., & Metzker, E. (2008). Adoption-centric Usability Engineering: Systematic
Deployment, Assessment and Improvement of Usability Methods in Software
Engineering: Springer-Verlag New York Inc.
Seufert, T., Schütze, M., & Brünken, R. (2009). Memory Characteristics and Modality
in Multimedia Learning: An Aptitude-treatment-interaction Study. Learning and
Instruction, 19(1), 28-42.
Sharp, H., Rogers, Y., & Preece, J. (2007). Interaction Design: Beyond Human-
computer Interaction: Wiley Hoboken, NJ.
Shavelson, R. J., Young, D. B., Ayala, C. C., Brandon, P. R., Furtak, E. M., & Ruiz-
Primo, M. A. (2008). On the Impact of Curriculum-embedded Formative Assessment
on Cearning: A Collaboration Between Curriculum and Assessment Developers.
Applied Measurement in Education, 21(4), 295-314.
Shneiderman, B. (1982). The Future of Interactive Systems and the Emergence of
Direct Manipulation. Behaviour & Information Technology, 1(3), 237-256.
Shneiderman, B., & Plaisant, C. (2005). Designing The User Interface (4th ed.).
Boston: Addison-Wesley/Pearson
Shoben Jr, E. J. (2009). Psychotherapy as a Problem in Learning Theory. Journal of
Psychotherapy Integration, 19(2), 111-139.
Sim, G., Horton, M., & Strong, S. (2004). Interfaces For Online Assessment: Friend or
Foe? Paper presented at the 7th HCI Educators Workshop Conference Effective
Teaching and Training in HCI. British Human-Computer-Interaction Group Preston.
233
Sim, G., Read, J., & Cockton, G. (2009). Evidence Based Design of Heuristics for
Computer Assisted Assessment. Human-Computer Interaction--INTERACT 2009, 204-
216.
Sim, G., Read, J. C., & Holifield, P. (2006). Using Heuristics to Evaluate a Computer
Assisted Assessment Environment.
Sim, G., Read, J. C., & Holifield, P. (2008). Heuristics for Evaluating the Usability of
CAA Applications.
Spence, R., & Apperley, M. (1982). Database Navigation: an Office Environment for
the Professional. Behaviour & Information Technology, 1(1), 43-54.
Starr, C. W., Manaris, B., & Stalvey, R. A. H. (2008). Bloom's Taxonomy Revisited:
Specifying Assessable Learning Objectives in Computer Science. ACM SIGCSE
Bulletin, 40(1), 261-265.
Stavropoulos, N. (2007). Interpretivist Theories in Law. Law: Metaphysics Meaning
and Objectivity.
Steinmetz, G. (2007). Fordism and the Positivist Revenant: Response to Burris, Riley,
and Fourcade. Social Science History, 31(1), 127.
Sternberg, R. J. (1988). The Nature of Creativity: Contemporary Psychological
Perspectives: Cambridge Univ Pr.
Swartz, S. M. (2006). Acceptance and Accuracy of Multiple Choice, Confidence-level,
and Essay Question Formats for Graduate Students. The Journal of Education for
Business, 81(4), 215-220.
Taras, M. (2009). Summative Assessment: The Missing Link for Formative
Assessment. Journal of Further and Higher Education, 33(1), 57-69.
Tarrant, M., Ware, J., & Mohammed, A. M. (2009). An Assessment of Functioning and
Non-functioning Distractors in Multiple-choice Questions: A Descriptive Analysis.
BMC Medical Education, 9(1), 40.
Taylor, J., Sumner, T., & Law†, A. (1997). Talking About Multimedia: A Layered
Design Framework. Learning, Media and Technology, 23(2), 215-241.
Te'eni, D., Carey, J. M., & Zhang, P. (2007). Human Computer Interaction: Developing
Effective Organizational Information Systems: Wiley.
234
Tomanek, D., Talanquer, V., & Novodvorsky, I. (2008). What do Science Teachers
Consider When Selecting Formative Assessment Tasks? Journal of Research in
Science Teaching, 45(10), 1113-1130.
Torrance, H. (2007a). Assessment as Learning? How the Use of Explicit Learning
Objectives, Assessment Criteria and Feedback in Post-secondary Education and
Training can Come to Dominate Learning. Assessment in Education: Principles, Policy
& Practice, 14(3), 281-294.
Torrance, H. (2007b). Assessment in Post-secondary Education and Training: Editorial
ntroduction. Assessment in Education: Principles, Policy and Practice, 14(3), 277-279.
Torrance, H., & Coultas, J. (2009). Do Summative Assessment and Testing Have a
Positive or Negative Effect on Post-16 Learners' Motivation for Learning in the
Learning and Skills Sector? : National Centre for Vocational Education Research
(NCVER).
Trotter, E. (2006). Student Perceptions of Continuous Summative Assessment.
Assessment & Evaluation in Higher Education, 31(5), 505-521.
Ventouras, E., Triantis, D., Tsiakas, P., & Stergiopoulos, C. (2010). Comparison of
Examination Methods Based on Multiple-choice Questions and Constructed-Response
Questions Using Personal Computers. Computers & Education, 54(2), 455-461.
Wilson, R. B., & Case, S. M. (1993). Extended Matching Questions: An Alternative to
Multiple-choice or Free-response Questions. Journal of Veterinary Medical Education,
20(3).
Zimmerman, B. (2008). Investigating Self-regulation and Motivation: Historical
Background, Methodolgical Developments, and Future Prospects. American
Educational Research Journal, 45(1), 166-183.
235
APPENDIX A: SURVEYS
Appendix A1:Paper Based Surveys
____________________________________________________________
Self Assessment Test Trial:
Developed and Administered by Graham Farrell:
Topic: C++ Fundamentals
It is important that you know that you can cease to participate in this project at
any time without reason or justification. If you choose to withdraw please let the
supervisor know and hand back the Questionnaire.
Subject Number: ____________________
Age: 18-15 26-30 31-40 41+
Sex: M F
Computer Experience: None Casual Proficient
1. Was the system easy to operate?
|_______________________________|___________________________________|
No Helpful Extremely
Help Helpful
2. Did the feedback display produce comprehensible information in order to be
valuable in directing the student along their learning path?
|__________________________________|___________________________________|
No Helpful Extremely
236
Help Helpful
3. Is a scoring system that penalised for incorrect choices and rewarded for correct
choices in a linear proportionality easy to comprehend?
|______________________________________|_______________________________|
Not Easy Extremely
Easy Easy
4. Would the participant A) actively use the sliding bar to register their level of
confidence freely and B) would they perceive the system as being either too
complicated or too threatening?
A)
|__________________________________|________________________________|
Not use sliding Bar Freely Unsure Use Sliding Bar Freely
B)
|______________________________|________________________________|
Not complicated Complicated
or Threatening Unsure or Threatening
5. Would a self-testing program of this design favour a particular learning style?
|______________________________________________________________|
Not Preferred Extremely
Prefered Preferable
237
6. Would students consider the proposed system might be more favourable to the
extraverted individual and disadvantage the introverted user?
|________________________________|_____________________________________|
Advantage Neither No Advantage
7. Would students consider the proposed system to be gender bias?
|__________________________________|__________________________________|
Biased Unknown Not Biased
238
239
240
241
242
243
244
245
246
Appendix A2: Online Surveys
247
248
249
APPENDIX B: SIMULATION RESULT DISPLAYS
Simulation of a student’s first attempt to use the MCQCM
250
Simulation of a student’s second attempt to use the MCQCM
251
APPENDIX C: MCQCM SCREEN PRESENTATIONS