12
2007; 29: e122–e132 WEB PAPER  An analysis of peer , self, and tutor assessment in problem-based learning tutorials  TRACEY PAPINCZAK, LOUISE YOUNG, MICHELE GROVES & MICHELE HAYNES School of Medicine, University of Queensland, Herston Road, Herston, 4006, Queensland, Australia  Abstract Objective: The purpose of this study was to explore self-, peer-, and tutor assessment of performance in tutorials among first year medical students in a problem-based learning curriculum. Methods:  One hundred and twenty-five students enrolled in the first year of the Bachelor of Medicine and Bachelor of Surgery Program at the University of Queensland were recruited to participate in a study of metacognition and peer- and self-assessment. Both quantita tive and qual itative data were colle cted from the assessmen t of PBL performance within the tutor ial setting, which included elements such as respo nsibility and respect, communication, and critica l analy sis through prese ntatio n of a case summary. Self-, peer-, and tutor assessment took place concurrently. Results:  Scores obtained from tutor assessment correlated poorly with self-assessment ratings ( r ¼ 0.31 – 0.41), with students consistently under-marking their own performance to a substantial degree. Students with greater self-efficacy, scored their PBL per for man ce more hig hly. Peer-asse ssment was a slightly more acc urate mea sur e, with peer-a ver aged scores cor rel ating moderately with tutor ratings initially ( r ¼ 0.40) and improving over time ( r ¼ 0.60). Students consistently over-marked their peers, particularly those with sceptical attitudes to the peer-assessment process. Peer over-marking led to less divergence from the tutor scoring than under-marking of one’s own work. Conclusion: According to the results of this study, first-year medical students in a problem-based learning curriculum were better able to acc ura tely judge the per for man ce of the ir pee rs compar ed to the ir own per for man ce. Thi s study has sho wn tha t self-assessment of process is not an accurate measure, in line with the majority of research in this domain. Nevertheless, it has an important role to play in supporting the devel opmen t of skills in reflection and self-awa renes s. Introduction Education of medical students should prepare them to deal  with problems in the future, equipping them with skills necessary to become active, self-directed learners, rather than passive recipients of information (Dolmans & Schmidt 1996).  Acknowledgment of this need was responsible, in part, for the development of problem-based learn ing (PBL) (Barrows & Tamblyn 1980).  Within the discipline of medical education, PBL is a cur ric ulum innova tio n tha t inv olves students in learning activities using loosely structured medical problems to drive learning (Norman & Schmidt 1992). The pedagogical appeal of PBL is its per cei ved capacity to enc our age , thr ough these learning processes, enhanced clinical reasoning skills, and the development of both an adaptable knowledge base and skills in self-directed learning necessary to become lifelong learners (Kelson & Distlehorst 2000). Four crucial conditions for a deep appr oach to learni ng ar e encompas sed wi thin the PBL approach: a well-structured knowledge base, active learning, colla bora tive learn er inter action , and a contex t desig ned to pr omote internal moti vation throug h the pr ovision of  pragmatic goa ls (Margetson 199 4). Assess men t of studen t progress in such a student-centred curriculum, however, has remained challenging (Eva 2001).  Assessment protocols within PBL curricula have sometimes sought to include self-, peer-, and tutor evaluation to assess a range of skills, such as self-directed learning, group coopera- tion, and communication (Swanson et al. 1997). Tutors and peers have a unique opportunity to judge each others’ work in PBL tutorials, and students should develop the ability to reflect on their own strengths and weaknesses as these are central elements of self-direc ted learning (Eva et al. 2004). Several published quantitative studies of peer-assessment  within PBL curricula reveal correlations between staff/tutor and peer ratings ranging from very low (Sluijman et al. 2001; Reiter et al. 2002) to moderate (Sullivan et al. 1999; Segers & Dochy 2001). Of limited research undertaken with medical students in PBL, moderate correlation between peer and tutor Practice points .  S elf-a ssessment results in subs tantia l under -mark ing compared to tutor assessment. .  Scores obtained from peer-assessment are significantly mor e generous tha n tho se scores ari sing fro m tutor assessment. .  Self-asse ssment is a less acc ura te means of assess ing student performance than peer-assessment. Correspondence:  Tracey Papinczak, 18 Debussy Place, Mt Ommaney, 4074, Queensland, Australia. Email: [email protected] e122  ISSN 0142–159X print/ISSN 1466–187X online/07/050122–11   2007 Informa UK Ltd. DOI: 10.1080/01421590701294323

assessment tutorial.pdf

  • Upload
    wahyura

  • View
    243

  • Download
    0

Embed Size (px)

Citation preview

7/27/2019 assessment tutorial.pdf

http://slidepdf.com/reader/full/assessment-tutorialpdf 1/12

2007; 29: e122–e132

WEB PAPER

 An analysis of peer, self, and tutor assessmentin problem-based learning tutorials

 TRACEY PAPINCZAK, LOUISE YOUNG, MICHELE GROVES & MICHELE HAYNES

School of Medicine, University of Queensland, Herston Road, Herston, 4006, Queensland, Australia

 Abstract

Objective: The purpose of this study was to explore self-, peer-, and tutor assessment of performance in tutorials among first year 

medical students in a problem-based learning curriculum.

Methods: One hundred and twenty-five students enrolled in the first year of the Bachelor of Medicine and Bachelor of Surgery 

Program at the University of Queensland were recruited to participate in a study of metacognition and peer- and self-assessment.

Both quantitative and qualitative data were collected from the assessment of PBL performance within the tutorial setting, which

included elements such as responsibility and respect, communication, and critical analysis through presentation of a case

summary. Self-, peer-, and tutor assessment took place concurrently.Results: Scores obtained from tutor assessment correlated poorly with self-assessment ratings (r ¼0.31 – 0.41), with students

consistently under-marking their own performance to a substantial degree. Students with greater self-efficacy, scored their PBL

performance more highly. Peer-assessment was a slightly more accurate measure, with peer-averaged scores correlating

moderately with tutor ratings initially (r ¼ 0.40) and improving over time (r ¼ 0.60). Students consistently over-marked their peers,

particularly those with sceptical attitudes to the peer-assessment process. Peer over-marking led to less divergence from the tutor 

scoring than under-marking of one’s own work.

Conclusion: According to the results of this study, first-year medical students in a problem-based learning curriculum were better 

able to accurately judge the performance of their peers compared to their own performance. This study has shown that 

self-assessment of process is not an accurate measure, in line with the majority of research in this domain. Nevertheless, it has

an important role to play in supporting the development of skills in reflection and self-awareness.

Introduction

Education of medical students should prepare them to deal

 with problems in the future, equipping them with skills

necessary to become active, self-directed learners, rather than

passive recipients of information (Dolmans & Schmidt 1996).

 Acknowledgment of this need was responsible, in part, for the

development of problem-based learning (PBL) (Barrows &

Tamblyn 1980).

 Within the discipline of medical education, PBL is a

curriculum innovation that involves students in learning

activities using loosely structured medical problems to drive

learning (Norman & Schmidt 1992). The pedagogical appeal of PBL is its perceived capacity to encourage, through these

learning processes, enhanced clinical reasoning skills, and the

development of both an adaptable knowledge base and skills

in self-directed learning necessary to become lifelong learners

(Kelson & Distlehorst 2000). Four crucial conditions for a deep

approach to learning are encompassed within the PBL

approach: a well-structured knowledge base, active learning,

collaborative learner interaction, and a context designed to

promote internal motivation through the provision of 

pragmatic goals (Margetson 1994). Assessment of student 

progress in such a student-centred curriculum, however, has

remained challenging (Eva 2001).

 Assessment protocols within PBL curricula have sometimes

sought to include self-, peer-, and tutor evaluation to assess arange of skills, such as self-directed learning, group coopera-

tion, and communication (Swanson et al. 1997). Tutors and

peers have a unique opportunity to judge each others’ work in

PBL tutorials, and students should develop the ability to reflect 

on their own strengths and weaknesses as these are central

elements of self-directed learning (Eva et al. 2004).

Several published quantitative studies of peer-assessment 

 within PBL curricula reveal correlations between staff/tutor 

and peer ratings ranging from very low (Sluijman et al. 2001;

Reiter et al. 2002) to moderate (Sullivan et al. 1999; Segers &

Dochy 2001). Of limited research undertaken with medical

students in PBL, moderate correlation between peer and tutor 

Practice points

. Self-assessment results in substantial under-marking

compared to tutor assessment.

. Scores obtained from peer-assessment are significantly 

more generous than those scores arising from tutor 

assessment.

. Self-assessment is a less accurate means of assessing

student performance than peer-assessment.

7/27/2019 assessment tutorial.pdf

http://slidepdf.com/reader/full/assessment-tutorialpdf 2/12

ratings was demonstrated by Sullivan (1999) while low 

correlation was reported by Reiter et al. (2002). Findings

arising from studies of medical students in non-PBL curricula

show generally moderate correlations (Burnett & Cavaye 1980;

 Van Rosendaal & Jennett 1992; Rudy et al. 2001; Minion et al.

2002). Several factors have the potential to impact negatively 

on the accuracy of peer evaluations, including friendshipmarking, and decibel marking which favours dominant group

members (Pond & ul-Haq 1997). These may result in peer 

over-marking often observed in quantitative studies of peer-

assessment (for instance, Rudy et al. 2001). It is also possible

that, in high stakes settings such as medical schools, inflated

estimates of peer performance would be the norm (Norcini

2003).

 Another format for evaluating student performance in PBL

tutorials is self- assessment. Self- and peer-assessment are

often combined or considered together. Peer-assessment, for 

instance, builds on evaluation skills that may be transferred to

self-assessment tasks and enables learners to compare their self evaluations with the assessments of others.

Despite meta-analyses of self-assessment in higher educa-

tion deeming students ‘well able to self-assess accurately’

(Sluijmans et al. 1999, p. 300), within medical PBL programs

reported correlations between self and tutor evaluations are

uniformly low (Rezler 1989; Gordon 1991; Das et al. 1998;

Sullivan et al. 1999; Reiter et al. 2002). Nor has a significant 

relationship been found between self-assessment scores and

examination results (Tousignant & Des Marchais 2002; Eva

et al. 2004). This is also true of non-PBL medical curricula,

 where poor association has been shown between

scores obtained from self-assessment and tutor  

assessment (or examination results) (Arnold et al. 1985;

 Woolliscroft et al. 1993; Rudy et al. 2001; Fitzgerald et al. 2003).

 When self-assessment scores are compared with peer-

assessment scores, low correlations have been shown in all but 

one study of PBL curricula or of medical courses (Sullivan et al.

1999; Reiter et al. 2002; Miller 2003). The exception, a study by 

Burnett and Cavaye (1980), reported a very high correlation

between self- and peer-assessment scores among fifth year 

medical students in a traditional curriculum. However, Eva

et al. (2004) reported disappointingly low correlation between

students’ self-assessment and performance on a test of medical

knowledge, with no evidence of improvement after one year 

of medical education. The tendency exists for students tooverestimate their competence (Mattheos et al. 2004), espe-

cially lower-performing students (Woolliscroft et al. 1993; Lejk

& Wyvill 2001; Edwards et al. 2003). Young or highly capable

students are more likely to undermark their work (Stefani

1992; Rudy et al. 2001; Edwards et al. 2003; Fitzgerald et al.

2003). The influence of gender on both self-assessment and

peer-assessment accuracy appears to be minimal (Falchikov &

Magin 1997; Mattheos et al. 2004), although the experience of 

self and peer-assessment may be more stressful for females

(Pope 2005).

Several approaches have been suggested to improve the

accuracy of scores generated from peer- or self-assessment.One well-supported idea is the use of co-assessment which

involves students with staff in the assessment process

reliability of tutor assessment in PBL tutorials for measures of 

student knowledge (Neville 1998; Cunnington 2001; Whitfield

& Xie 2002), the potential exists for tutor assessment to be

combined with or compared to peer- or self-assessment to

improve the accuracy and comprehensiveness of the evalua-

tions generated (Dochy et al. 1999; Eva 2001). Tutors are in a

reasonable position to judge group processes (Dodds et al.2001).

The aim of this study was to explore peer- and self-

assessment within PBL tutorials in a medical course using

qualitative and quantitative approaches. Qualitative data were

collected to gather students’ perceptions of these alternate

forms of assessment (see Papinczak et al. 2007). Quantitative

data were analysed to assess the ‘accuracy’ of students as

assessors, with tutor scores as comparison. The impact of 

specific demographic factors and students’ self-efficacy was

analysed to gain greater understanding of influences on

scoring. It was anticipated that confident (efficacious) students

 would award themselves higher marks, althought this may bemediated by fears of self-aggrandisement in a public arena

(see Chaves et al. 2006).

Self-efficacy is defined as students’ perceptions of their 

ability to successfully carry out a task (Bandura 1986). When

facing a difficult learning task, a student with high self-efficacy 

beliefs is more likely to participate more actively, work harder,

remain more problem-focussed, and persist for a longer time

than a student with low self-efficacy, who is more likely to

 view the situation as insurmountable, get frustrated and give

up (Pajares 1996; Nichols & Steffi 1999). Students with high

levels of self-efficacy are more willing to take on challenging

tasks (Zimmerman 2000), whereas students with low self-

efficacy may fail to achieve even when goals are within easy 

reach (Bandura 1993). The effect of self-efficacy on scores

obtained through self-assessment has not previously been

evaluated within the PBL tutorial setting.

The PBL environment, with its emphasis on self-directed

and collaborative learning, provides a unique context in which

to explore alternative forms of assessment. As they work

together in PBL tutorials, students may develop interdepen-

dent relationships facilitating learning and motivation

(Willis et al. 2002). This study sought to incorporate

qualitative and quantitative dimensions in order to gain a

fuller understanding of peer- and self-assessment within

collaborative small group environments.

Description of the study 

Quantitative and qualitative data were gathered as part of 

a larger study of metacognitive processes undertaken with

first-year students enrolled in the Bachelor of Medicine and

Bachelor of Surgery (MBBS) Program at The University of 

Queensland, Australia. Only the results of the self- and peer-

assessment segment of the study are reported here, including

quantitative findings and qualitative results which may be

explanatory or insightful.Ethical approval was obtained from the University of 

Queensland’s Behavioural & Social Sciences Ethical Review 

 An analysis of peer, self, and tutor assessment

7/27/2019 assessment tutorial.pdf

http://slidepdf.com/reader/full/assessment-tutorialpdf 3/12

Setting

The MBBS Program introduced a four-year, graduate-entry 

PBL curriculum in 1997. First-year students, in small groups of 

nine or ten, undertake five hours of PBL tutorial time each

 week for 33 weeks of the year. Working in collaboration with

group members, students analyse a problem of practice,formulate hypotheses, and undertake self-directed learning to

try to understand and explain all aspects of the patient’s

‘problem’.

Subjects

The study was conducted with 125 first-year medical students

and 20 tutors over a period of six months during 2004. Every 

student in thirteen tutorial groups took part in a program of 

educational activities within their PBL tutorials, including

peer- and self-assessment. Subsequent statistical analysis

showed that the self-selected study subjects (40.2% of the

student population) were representative of the entire cohort 

on measures of age, gender, and primary undergraduate

degree.

Instruments

Qualitative and quantitative data were generated using two

instruments: the peer assessment instrument (as shown in

Figure 1) and the test of self-efficacy, which all participants

completed at the commencement of the study.

 The peer assessment instrument

In order to enhance student ownership of assessment criteria

(as recommended by Boud (1995) and Orsmond et al. (2000)),

members of several PBL tutorial groups in the previous cohort 

 were invited to participate in the development of an

instrument for peer- and self-assessment of students perfor-

mance in PBL tutorials. Students were first presented with a list 

of criteria derived from relevant literature (including Das et al.

(1998) and Willis et al. (2002)) from which a set of items were

selected for inclusion in the first draft of the instrument.

The negotiated instrument with 19 items, labelled the

peer assessment instrument, was trialled with another student 

group and rated as easy to use and understand by allparticipants. Student dissatisfaction with two items resulted in

their removal from the final version of the instrument. The

resulting scale measures several features of successful adult 

education, such as participation, punctuality, respect for 

others, effective communication, and critical analysis

(as shown in Figure 1). However, the inclusion of items

specifically targeting self-directed learning and self-awareness,

core features of PBL, allows it to be differentiated from others

 which may be appropriate for open-ended, but less student-

centred, approaches—such as case-based instruction (Hay &

Katsikitis 2001).

The phrasing of items on the peer assessment instrument  was varied slightly to make it more relevant to self-evaluation

 where applicable, for instance, ‘I’ instead of ‘the student’.

question (inviting comments) on the final page of the

questionnaire. These were analysed and coded to themes to

provide insight into student perceptions.

In order to gain a measure of face validity, three

experienced PBL facilitators were asked to indicate whether 

each of the 17 items on the instrument was relevant to PBL

performance and able to be adequately assessed using theitem in question. Unanimous face validity was obtained for all

items in the four sub-scores: responsibility and respect,

information processing, communication, and critical analysis.

Some dissent about the validity of the self-awareness sub-score

 was evident. Construct validity describes the degree to which

the items used in the instrument define the constructs (Pressley 

& McCormick 1995). The five constructs or domains of 

performance were reported extensively in the medical and

nursing education literature. Each of the three PBL tutors and

ten PBL students were asked to categorise the 17 items into the

five specified domains. In all cases, the items were distributed

in accordance with the domains as defined on the instrument. Values for Cronbach’s alpha ranged from 0.76 to 0.84,

indicating good internal consistency among the five sub-

scores. Acceptable reliability was found, with Pearson correla-

tion coefficients for peer-averaged and tutor assessment 

ranging from 0.40 to 0.60. Notably, self-awareness items

 were problematic with a significant number of students

consistently entering ‘not applicable’ for those two items.

Unfortunately, time constraints prevented further renego-

tiation of the peer assessment instrument with the subsequent 

cohort prior to the commencement of the study.

 Test of self-efficacy

The instrument to measure students’ self-efficacy was

composed specifically for this project as existing instruments

 were not designed for use in problem-based learning courses.

The test of self-efficacy comprises eleven closed questions

relating to regulation of, and confidence in, learning, with

scores rated on a Likert scale of one-to-five. The first six items

(Part A) deal with students’ perceived capability to use various

self-regulation strategies, such as organizing their studies, and

concentrating and participating in small-group tutorials. These

 were loosely based on Bandura’s (1989) multidimensional

scales of perceived self-efficacy reported in Zimmerman et al.

(1992). This original scale was designed to measure highschool students’ perceived capability to use various self-

regulating strategies, such as concentrating on school subjects,

organising schoolwork, and participating in school discus-

sions. Of the eleven items on the original scale, the most 

applicable six were chosen and rewritten to more appro-

priately reflect the learning and studying activities carried out 

by students in this medical course in order to create a brief 

instrument measuring self-efficacy to regulate learning. The

six items deal with the self-regulation strategies: completing

allocated learning objectives for the group; studying when

there are distractions; planning and organising study; course

motivation; and concentration and active participation intutorials.

 A further five items measuring self-efficacy for academic

 T. Papinczak et al.

7/27/2019 assessment tutorial.pdf

http://slidepdf.com/reader/full/assessment-tutorialpdf 4/12

(Part B) of the instrument. These were framed using items

taken from the ten-item measure of self-efficacy first reported

by Schwarzer and Jerusalem (1995) with five questions

selected and modified to better measure the specific respon-

sibilities of examination performance, tutorial participation,

self-awareness, clinical reasoning, and academic achievement 

under consideration in this study.

Statistical testing to determine internal reliability yieldedCronbach’s alpha values of 0.68 for the first six items

measuring self-efficacy for self-regulation, and 0.73 for the

remaining five items dealing with self-efficacy for academic

achievement. Reliability was not improved by the omission of 

a single item from either self-efficacy measure.

Peer-, self- and tutor assessment

The study was undertaken with the 2004 cohort of medical

students, using the assessment instrument previously devel-

oped with student input. Student feedback was collectedduring, and at the conclusion of, the study. This feedback was

applied to modify aspects of the larger study framework, only 

itself. All participants completed the test of self-efficacy before

commencing the program of peer- and self-assessment.

For a period of twenty-four weeks, all members of thirteen

PBL tutorial groups took part in an activity designed to

enhance learning. Two key components were introduced into

PBL tutorials, both of which were readily integrated into the

existing tutorial format with minimal additional time required

from tutors or students. These components were:(1) Reflection on learning : Each week one student from

each tutorial group was asked to compose a summary 

of the week’s medical problem incorporating the

clinical reasoning and collaborative learning occurring

in their PBL tutorial group. The student was encouraged

to present the summary to the group as a concept map

or in mechanistic case-diagramming format (Guerrero

2001) to give a visual representation of both the content 

and the clinical reasoning entailed in solving the

problem (a ‘knowledge object’ (Entwistle & Marton

1994)).

(2) Peer- and self-assessment : The student presentation, inassociation with his/her fulfilment of PBL roles and

responsibilities for that week, was assessed using the

Peer assessment instrument

Please answer the items below indicating the strength of your agreement or disagreement with

the statements about this student’s performance in this week of PBL tutorials by circling the

number on the scale. {1 = totally disagree ; 5 = totally agree}

The student: 

A. Responsibility and Respect

1. Completed all assigned tasks to the appropriate level

2. Completed all assigned tasks on time

3. Participated actively in the tutorial

4. Showed behaviour and input which facilitated my learning

5. Was punctual to this PBL tutorial

6. Listened to and showed respect for the opinions of others

1 2 3 4 5

1 2 3 4 5

1 2 3 4

1 2 3 4 5

1 2 3 4 5

1 2 3 4 5

B. Information processing 

7. Brought in new information to share with the group

8. Provided information that was relevant and helpful

9. Seemed to use a variety of resources to obtain the information

1 2 3 4 5

1 2 3 4 5

1 2 3 4 5

C. Communication

10. Was able to communicate ideas clearly11. Made comments and responses that did not confuse me

1 2 3 4 51 2 3 4 5

D. Critical analysis

 12. Gave input which was focussed and relevant to the case

13. Made conclusions that can be substantiated by the evidence

presented in the case

14. Gave a thorough summary of the case

15. Gave a summary of the case which showed evidence of 

reflection and evaluation

1 2 3 4 5

1 2 3 4 5

1 2 3 4 5

1 2 3 4 5

E. Self-awareness

16. Appeared able to assess his/her own strengths and weaknesses

within PBL 1 2 3 4 5

17. Accepted and responded to criticism gracefully 1 2 3 4 5

(* adapted from Das, 1998 )

5

Figure 1. The final 17-item draft of the Peer assessment instrument.

 An analysis of peer, self, and tutor assessment

7/27/2019 assessment tutorial.pdf

http://slidepdf.com/reader/full/assessment-tutorialpdf 5/12

assessment took place concurrently. Scores from these

 worksheets were compared to explore statisticalrelationships. Qualitative data were analysed.

Constant monitoring of student perceptions of, and attitudes

to, these educational activities helped to monitor the effects of 

the dual activities to maximize student learning. This is in

keeping with an action research process which uses over-

lapping cycles of planning, acting and observing, and

reflecting (Kemmis & Wilkinson 1998) to maintain a respon-

sive and flexible study design.

In the first week of the main study (week four of the

academic year), students in the thirteen tutorial groups were

distributed two documents: a copy of the peer assessment 

instrument to enable students to become familiar with theevaluation criteria, and an exemplar outlining ‘good’ and ‘poor’

outcomes for the criteria. Tutors assigned to each group then

led their group in a practice session, with a tutor-led

presentation of a summary of the previous week’s medical

case, in order to establish familiarity with the instrument and

process. The summary itself was written by the researcher and

presented, with explanation, to each tutor in the week prior to

the trial. Tutors received written information, a short informa-

tion session, and frequent communication and feedback to

help maintain fidelity of treatment.

In the ensuing 23 weeks, tutorial groups implemented

the summarization and assessment activities at the start of each week as part of the ‘wrap-up’ of the previous week’s

PBL case. Each student was expected to be the focus of two

(as shown in Figure 2). Tutors were encouraged to give

concise feedback (based on written peer comments on thereverse of the assessment sheet) to students as soon as

possible after the completion of the peer and self-assessment 

procedure. Student feedback about the exercise was regularly 

invited as part of the action research process.

During the course of implementation, two tutorial groups

 withdrew from the study (16% of participants). Their justifica-

tion for withdrawal was based on perceptions of their 

experiences, including scepticism about the value of peer-

assessment and concerns about friendship marking. Statistical

analysis showed students withdrawing did not differ signifi-

cantly from those remaining in the study in terms of age,

gender, primary degree, or self-efficacy.

Data analysis

 Analysis of the data was implemented using statistical software

SPSS Version 13.0. Scores on all five sub-scores of the peer 

assessment instrument were summed to give an overall score,

 with a maximum score of 85. For each marking episode, data

for each student consisted of a self-assessment score, a tutor 

assessment score, and up to nine peer-assessment scores.

Scores obtained from the test of self-efficacy were summed to

create two sub-scores: self-efficacy for self-regulation (with amaximum score of 30) and self-efficacy for academic achieve-

ment (with a maximum score of 25). These two sub-scores

Student A presents case summary for the week

Student A presents case summary for the week

Tutor scores on

8 – 10 weeks later

Tutor scores on Peer 

 Assessment Instrument 

8 – 9 peers score on Peer 

 Assessment Instrument 

- skewed scores removed

and average score calculated

Self-scoring on

self-assessmentversion of Peer 

 Assessment 

 Instrument 

Tutor scores on Peer 

 Assessment Instrument 

8 – 9 peers score on Peer 

 Assessment Instrument 

- skewed scores removed

and average score calculated

Self-scoring on

self-assessment

version of Peer 

 Assessment 

 Instrument 

MARKING EPISODE 1

MARKING EPISODE 2

Figure 2. Diagram showing the sequence of marking episodes and assessment events for each student in each of 

13 tutorial groups.

 T. Papinczak et al.

7/27/2019 assessment tutorial.pdf

http://slidepdf.com/reader/full/assessment-tutorialpdf 6/12

Descriptive statistics for tutor, self- and peer-assessment are

presented in Table 1 for scores for each of the two marking

episodes. Data were missing for individuals failing to submit 

completed assessment instruments. Frequency histograms

revealed non-normal distributions of peer scores resulting

from peer-assessment, with some groups awarding full marks

for a large proportion of assessments. As qualitative data made

it apparent that some students deliberately scored 100% for 

peer performances, irrespective of quality, it was resolved to

apply an algorithm to reduce the prevalence of deliberately 

skewed scores or scores resulting from friendship marking and

students’ cavalier attitudes. In instances where the tutor score

for a given group was 72 out of 85 (representing a result of 85%)

or less, all peer scores of 100% were omitted from the statistical

analysis for that tutorial group. In this way, the most highly 

skewed results were excluded from the data set (representing

4.6% data loss) yet the data remained a reflection of the peer-assessment process which operates in a climate of student 

generosity towards others (see, for instance, Rudy et al. 2001).

 Averaged peer-assessment scores were calculated by 

computing the mean for each students’ completed peer-

assessment instruments (once skewed results were removed).

The reliability among peer-averaged scores for all intervention

tutorial groups was well within acceptable limits with

Cronbach’s alpha scores ranging from 0.66 to 0.77. The results

of Kolmogorov–Smirnov testing confirmed that scores for self-,

peer-averaged, and tutor-assessment generated from two

marking episodes followed a normal distribution. The para-

metric tests chosen are fairly robust and should remainrelatively unaffected by the observed clustering of assessment 

marks to the upper end of the range. Despite reservations

about its use in this capacity (see Ward et al. 2002) tutor 

scoring was utilised as the most appropriate benchmark for 

comparative purposes in assessing the reliability of peer- and

self-assessment.

In order to provide evidence for claims of ‘accuracy’, Bland

 Altman plots (see Bland and Altman 1986) were used to

graphically represent levels of agreement between two sets of 

scores. Average of scores was plotted against difference

between paired scores for (1) self- versus tutor scores at time

2 and (2) peer-averaged versus tutor scores at time 2. Threelines representing the mean difference and upper and lower 

limits of agreement were drawn. The limits of agreement were

suggested by Bland and Altman (1986), which are meandifferenceÆ 2 standard deviations, were considered too wide

to give meaningful results in this study and were not used.

Table 1. Descriptive statistics – self-, peer-, and tutor assessment - for each of the two marking episodes.

Marking episode Score Number of responses Mean Standard deviation Minimum–maximum score

1 Self-assessment 108 68.80 8.32 44–85

1 Averaged peer-assessment 115 79.08 4.21 66–85

1 Tutor assessment 89 76.15 7.58 43–85

2 Self-assessment 82 67.70 10.70 38–85

2 Averaged peer-assessment 87 79.04 3.75 68–84

2 Tutor assessment 70 74.99 8.96 46–85

80706050

Average of scores

30

20

10

0

−10

−20

−30

−40

   M  e  a  n   d   i   f   f  e  r  e  n  c  e

Upper limit = −3.34

Lower limit = −11.84

Mean = −7.59

Scatter plot - time 2 scores for self vs tutor assessment(self scores lower)

Figure 3. Bland-Altman plot of self-assessment versus tutor 

scores at time 2.

858075706560

Average of scores

40

30

20

10

0

−10

−20

   M  e  a  n   d   i

   f   f  e  r  e  n  c  e

Upper limit = 8.00

Mean = 3.75

Lower limit = −0.50

Scatter plot - time 2 scores for peer averaged vs tutorassessments (peer averaged scores higher)

Figure 4. Bland-Altman plot of peer-averaged versus tutor 

scores at time 2.

 An analysis of peer, self, and tutor assessment

7/27/2019 assessment tutorial.pdf

http://slidepdf.com/reader/full/assessment-tutorialpdf 7/12

Results

Self-assessment

Demographic variables. Multilevel regression analysis was

used to explore the relationship between demographic

 variables and self-assessment scores. Of four factorsincorporated into the model (age, gender, primary 

undergraduate degree, and repeat student status), only 

primary degree was statistically significant in explaining the

 variance in self-assessment scores. Students with an arts,

commerce, music, education or law degree on admission

to the MBBS Program were significantly more likely to

have higher initial self-assessment scores than others,

 while those with pure sciences or therapies degrees marked

themselves significantly lower (t ¼ 2.89; p ¼0.05). This

distinction was less noticeable in the second marking episode.

Self-efficacy. Initial self-efficacy for self-regulation was mod-

erate to high with a mean of 23.85 (out of 30) and a standard

deviation of 3.18, while initial self-efficacy for academic

achievement also showed relatively elevated levels

(mean¼ 19.51 out of a possible 25, standard deviation¼ 2.51).

In order to explore the relationship between self-assessment 

scores and self-efficacy, a multiple linear regression

analysis was undertaken. Only initial self-efficacy for 

self-regulation was statistically significant in

explaining the variance in self-assessment scores (t ¼À3.85,

 p ¼ 0.001).

Comparison of means. Direct comparison of the self-assess-

ment mean with the tutor score revealed consistent under-

marking of students’ own work, as shown in Table 1. Paired

t -tests were undertaken to determine whether statistically 

significant directional differences existed for each marking

episode. In each marking episode, the students scored

themselves significantly lower than their tutor (t ¼À5.27 to

À8.10; p < 0.001).

 Analysis of qualitative data indicated students were

concerned about lack of objectivity. One student commented

on their struggle to remain impartial: ‘I find it difficult to

undertake self assessment—mainly because I feel that my 

perception of my performance may be inaccurate due to bias

or distorted perceptions’.

Inter-rater agreement. To explore levels of agreement 

between scores for self-assessment and other tutorial-

based scores, two analyses were undertaken. These were:

(a) correlation to test for the strength and direction of linear 

relationships between scores; and (b) Bland Altman plots to

graphically represent scores obtained from self- and tutor 

assessment.(a) Pearson correlation coefficients were computed for 

self and peer-averaged, and self and tutor assessment 

scores derived from both marking episodes

(see Table 2). Despite reaching statistical significance,

the correlation between self and peer-averaged scores

 was low-to-moderate (r ¼ 0.30 – 0.32). A slightly 

stronger correlation was observed for self and tutor 

scores (r ¼ 0.31 – 0.41). Tremendous variability existed

between tutorial groups. When groups were analysed

for score correlation separately, across both marking

episodes, six tutorial groups showed very high levels of 

marking agreement with self and tutor score correlationcoefficients ranging from 0.74 to 0.92. Other groups

showed low correlations.

(b) Bland Altman plots charted the difference between tutor 

and self-assessment scores against the average of these

scores. Plots showed poor accuracy of self- versus tutor 

assessment at both times 1 and 2, with a considerable

proportion of plotted scores well outside the levels of 

agreement (meanÆ5% of the maximum score). Figure 3

shows a Bland–Altman plot for self- versus tutor 

assessment at time 2. The mean differences of –7.59

(time 2) highlights both the considerable under-marking

of self compared to tutor scores and lack of  

accuracy. The standard deviation was quite large

(11.70 at time 2), indicating a wide spread of scores

about the mean.

Peer-assessment

Demographic variables. Analysis of variance demonstrated

no significant differences between the marks awarded to

peers based on the presenting student’s gender, age, or 

primary degree. There was evidence, however, of a trend

towards higher scores being awarded to older male students in

the groups. This failed to reach statistical significance (F ¼ 3.12;

df ¼ 12; p ¼0.095).

Table 2. Correlation between pairs of scores obtained from self-, peer-, and tutor assessment. Peer averaged scores have been used.Cronbach’s alpha for all peer averaged scores across 13 tutorial groups in the first marking episode¼0.77, while Cronbach’s alpha for all peer

averaged scores across 13 tutorial groups in the second marking episode¼0.66.

Marking episode Paired scores Number of paired responses Pearson correlation coefficient p value (2 tailed)

1 Self and tutor scores 85 0.41 <0.001

1 Self and peer-averaged scores 108 0.32 <0.001

1 Tutor and peer-averaged scores 89 0.40 <0.001

2 Self and tutor scores 66 0.31 0.012

2 Self and peer-averaged scores 82 0.30 0.007

2 Tutor and peer-averaged scores 70 0.60 <0.001

 T. Papinczak et al.

7/27/2019 assessment tutorial.pdf

http://slidepdf.com/reader/full/assessment-tutorialpdf 8/12

Comparison of means. Direct comparison of the peer-

averaged mean with the tutor score revealed consistent over-

marking by peers (see Table 1). Paired t -tests were undertaken

to determine whether statistically significant directional differ-

ences existed for each marking episode. In each marking

episode, the mean of the peer scores for each student 

presentation was significantly higher than the score awardedby their tutor (t ¼3.71 to 4.14; p < 0.001).

Inter-rater agreement. To explore levels of agreement 

between scores for peer-averaged assessment and other 

tutorial-based scores, two analyses were undertaken. These

 were: (a) correlation to test for the strength and direction of 

linear relationships between scores; and (b) Bland Altman

plots to graphically represent scores obtained from peer-

averaged and tutor assessment.

(a) Table 2 presents Pearson correlation coefficients for 

tutor and peer-averaged scores generated from both

marking episodes. At best moderate correlations weredemonstrated initially for tutor and peer-averaged

scores (r ¼0.40), with some improvement over time

(r ¼ 0.60). This data supports the acceptable reliability of 

the assessment instrument subject to the limitations of 

the use of tutor assessment as the benchmark.

 When tutorial groups were analysed for score correlation

separately, in all seven of the thirteens groups were capable of 

 very high levels of marking agreement with correlation

coefficients ranging from 0.76 to 0.96. Qualitative data

indicated that the majority of these groups were very 

supportive of, and committed to, the peer-assessment process.Comments such as: ‘. . .good to learn how to do this

appropriately, as I think we will need to be able to assess

our peers’ performance, as well as our own, throughout our 

careers’ were given by some enthusiastic respondents.

Scores obtained from other tutorial groups were in

substantially less agreement. Most of these group members

expressed negative views about peer-assessment related

specifically to potential for bias. The effect of omission of 

highly skewed results (as discussed earlier) on correlation was

briefly explored. Data editing was found to improve the peer-

tutor correlation from 0.32 to 0.40 in the first marking episode.

(b) Bland Altman plots charted the difference between tutor and peer-averaged scores against the average of these

scores. A moderate level of agreement between peer-

averaged and tutor assessment at both time 1 and 2

 were shown, with a considerable proportion of plotted

scores within the levels of agreement (meanÆ5% of the

maximum score). The mean differences of 3.75 (time 2)

highlight both the over-marking of peers compared to

tutor scores and improved accuracy of peer-averaged

scores compared to scores derived from self-assessment 

(see Figure 4).

Variability between tutorial groups. Differences between theaveraged peer-assessment scores of all 13 groups were

explored using multivariate analysis of variance. Statistically 

(F ¼ 2.09; df ¼ 12; p ¼0.028) was evident. The effect size, as

measured by partial eta squared, was 0.26. Figure 5 illustrates

this variability, through box plots, for the 13 tutorial groups on

the 0–85 scale of the peer assessment instrument. Analysis of 

 variance demonstrated five tutorial groups had reliably 

recorded significantly lower peer-average scores, while three

groups had consistently scored group members more favour-

ably. Comparison between the three sources of assessment 

scores (self-, peer-, and tutor) revealed a small number of 

groups consistently reporting high student-generated scores in

the absence of high tutor scores. Based on tutor assessment as

the benchmark, members of these tutorial groups must be

awarding peers overly generous marks. Qualitative data

suggest this may relate to friendship marking.

 A strong reaction to peer-assessment was the widespread

perception that this process could be corrupted by bias due to

friendship marking, fear of ‘tit-for-tat’ scoring, or lack of 

honesty. The following comments sum up the attitude among

many study participants: ‘(It is) hard to criticise friends’ and

‘Relationships between students can colour opinions’. Some

students expressed casual and/or sceptical attitudes, scoring

100% for each student in their group regardless of the quality 

of the work to be judged. Frequent comments such as:

‘Not taken too seriously’, and ‘Not too much thought goes intothe marking’ reflect a cynical attitude to the peer-assessment 

process.

Discussion and conclusion

Self-assessment appears to be a less accurate means of 

evaluating student performance in PBL tutorials than peer-

assessment. The heterogeneity of the group was confirmed by 

the presence of many outliers in the data, confirming the low 

accuracy of self-assessment among this sample of students.

Subject to the variability observed between the scores

obtained from 13 tutorial groups, correlations between self-assessment scores and those generated from tutor and peer-

assessment were disappointingly low. Some groups showed

3129282724231716149764

Group number

85

80

75

70

65

   P  e  e  r  -  a  s  s  e  s  s  m  e  n

   t  a  v  e  r  a  g  e   (   1   )

63

44

20

22 86

76

5

Figure 5. Box plots of averaged peer-assessment scores for 

thirteen tutorial groups with results obtained from the first 

marking episode.

 An analysis of peer, self, and tutor assessment

7/27/2019 assessment tutorial.pdf

http://slidepdf.com/reader/full/assessment-tutorialpdf 9/12

 were also more accurate than others in their self-assessment.

These findings are in accord with the general consensus in

medical education that self-assessment of tutorial processes

in PBL is an inexact measure (Rezler 1989; Gordon 1991;

Das et al. 1998; Sullivan et al. 1999; Reiter et al. 2002).

Students clearly under-marked themselves, particularly 

those with pure science or therapies degrees. Other authorshave shown that young or highly capable students are more

likely to undermark their work (Stefani 1992; Rudy et al. 2001;

Edwards et al. 2003; Fitzgerald et al. 2003). Analysis of 

qualitative data suggests that students struggled to find a

balance between confidence in their performance, self-

awareness, and humility. Objectivity was also a major cause

of concern.

Self-efficacy was correlated with self-assessment. Students

awarding themselves higher marks were more likely to have

stronger self-efficacy for self-regulation. Self-assessment scores

 would be influenced by many factors, but confidence in one’s

ability to do well would be expected to provide incentive toaward oneself higher marks on PBL performance. Positive

collaborative learning behaviour has been shown to be related

to learning self-efficacy (McLoughlin & Luca 2004).

Peer-assessment offers a greater likelihood of providing

accurate alternate forms of assessment within the PBL tutorial

environment. Correlation between tutor and peer-averaged

scores were barely moderate at first, then improved with

continued practice in peer-assessment. Some groups achieved

 very high correlation (up to 0.96) between tutor scores and

peer-averaged scores. The use of peer-averaged rather than

individual scores may help to account for apparently improved

accuracy compared to self-assessment. Nevertheless, some

students, and some groups of students, were able to

judge the performance of their peers in PBL tutorials with

precision.

The removal of highly skewed results prior to the

generation of peer-averaged scores improved the correlation

between peer-averaged and tutor scores. Qualitative data

indicated that some students were treating the peer-assessment 

process with casual and/or sceptical attitudes. By removing

scores known to be deliberately distorted, it was anticipated

that the peer-averaged scores would more appropriately 

reflect the genuine abilities of responsible students to assess

their peers. Data analysis indicated that this was the case,

 with a small improvement in the correlation of tutor andpeer-averaged scores in the first marking episode once

highly skewed scores were omitted. The implementation

of peer-assessment in any setting is likely to lead to

initial scepticism and doubt about its value and validity.

However, through repeated exposure to, and practice in,

peer-assessment, such perceptions should be moderated

(Sluijmans et al. 1999; Ballantyne et al. 2002), and highly 

skewed results would be expected to decrease in frequency.

The improved correlation between peer-averaged and tutor 

assessment at time 2 (compared to time 1) lends support to

this supposition.

Results from other studies of peer-assessment of processesin PBL (or small group) tutorials show variable correlations

between staff and peer scores. Correlations range from very 

In keeping with the findings of Rudy et al. (2001), students

 were over generous in their marking of peers. The consistent 

under-marking of self combined with over-marking by peers

helps to account for the low correlation found for self and

peer-assessment. Qualitative results showed the potential for 

inflated estimates of peer performance resulting from friend-

ship marking. Friendship marking has been reported by other researchers as biasing peer-assessment responses

(Pond & ul-Haq 1997).

This study has the capacity to make a contribution

to knowledge in the area of peer- and self-assessment 

in PBL tutorials. The study incorporated strengths in

four main areas. These were: (1) focus on the learning

process in PBL tutorials; (2) duration of the program;

(3) congruence with PBL philosophy; and (4) triangulation in

data collection.

(1) The focus on learning process is an important strength

of the study design as it enabled insights into learning

processes undertaken by students. Bereiter and

Scardamalia (2000) call for greater research into PBL

processes using reflective action research.

(2) The moderate duration of the program (exceeding six 

months) leads to greater confidence in study findings.

Loss of participants over time (16% of participants) was

not excessive given the time pressures experienced by 

students.

(3) Congruence with PBL philosophy allowed the study to

complement the existing structure of PBL tutorials. By 

supporting the practice of self-assessment within PBL,

the study upheld a SDL emphasis. Collaboration was

enhanced through the use of collaborative assessment formats. The reflective component of the intervention

built upon the review phase of the PBL learning cycle.

(4) Methodological triangulation was achieved by combin-

ing quantitative and qualitative approaches to study 

design. While qualitative inquiry helped to confirm

theory emerging from student perspectives, quantita-

tive inquiry enabled a set of statistical relationships to

be uncovered.

Results should be interpreted within the context of potential

limitations, including non-probability sampling, a relatively 

small sample size, subjective scoring of test items, and thedifferent ways students and tutors interpret and apply 

assessment criteria. With regard to assessment of PBL

processes such as communication and respect, no real

benchmark exists (Ward et al. 2002). This casts some doubt 

on the validity of expert assessment in this domain.

Peer- and self-assessment within the tutorial setting has an

important role to play through its reinforcement of 

the educational goals and instructional principles of problem-

based learning (Nendaz & Tekian 1999; Segers & Dochy 2001).

There is evidence that tutorial-based assessment may also

reduce the overwhelming reliance on formal grading of 

students which encourages competition rather thancollaboration (Eva 2001).

First-year medical students in this study demonstrated poor 

 T. Papinczak et al.

7/27/2019 assessment tutorial.pdf

http://slidepdf.com/reader/full/assessment-tutorialpdf 10/12

the creation and presentation of a case summary. Normally a

private process, self-assessments conducted publicly require

students to balance unrealistic goals and perceptions, assess-

ment anxiety, and ‘social norms about self-aggrandizement’

(Chaves et al. 2006, p. 30). This makes it unlikely that 

self-assessment accuracy in medical education is achievable

(Eva & Regehr 2005).Nevertheless, practice in self-assessment should be

integrated into existing programs of medical education. As

Eva and Regehr (2005) emphasise, self-assessment is a means

of identifying one’s strengths and weaknesses to guide goal

setting and enhance self-efficacy. This study has shown that 

self-assessment is not an accurate measure, in line with the

majority of research in this domain. Nevertheless, it has an

important role to play in supporting the development of skills

in reflection and self-awareness. Self-assessment needs to be

 viewed from a ‘self-improvement perspective’ (Eva & Regehr,

2005, p. S52). Further qualitative research needs to be

conducted to better understand students’ apparent inability to self-assess accurately within collaborative small group

learning environments.

Peer-assessment provides a valuable opportunity for tutor-

ial-based assessment. The act of evaluating the performance of 

professional peers has long been central to the referral process

in medicine (Norcini 2003). Skills gained through peer-

assessment activities may transfer to self-assessment tasks and

enable learners to compare their self-assessment with the

assessments of others (Searby & Ewers 1997; Dochy et al. 1999).

Feedback from peers has the potential to assist learners to

develop more accurate impressions of themselves and their 

abilities (Eva & Regehr 2005).

 A fundamental part of the PBL process is the capacity of 

students to embrace their responsibilities as active members of 

a group of learners. These may include collaborative and self-

assessment practices which have the potential to enhance

reflection and self-awareness.

Notes on contributors

TRACEY PAPINCZAK is completing her PhD in medical education within

the School of Medicine, The University of Queensland.

DR LOUISE YOUNG is a senior lecturer in the School of Medicine at The

University of Queensland and is currently Deputy Director of the

University’s Centre for Medical Education.

 ASSOCIATE PROFESSOR MICHELE GROVES is Deputy Head of School and

Director of Medical Studies in the School of Medicine, Griffith University,

Queensland.

DR MICHELE HAYNES works at The University of Queensland’s Social

Research Centre as Statistical Advisor and lectures in the School of Social

Science.

References

 Arnold L, Willoughby TL, Calkins EV. 1985. Self-evaluation in under-

graduate medical education: a longitudinal perspective. J Med Edu

60:21–28.

Ballantyne R, Hughes K, Mylonas A. 2002. Developing procedures for 

implementing peer assessment in large classes using an action research

process. Asses Eval Higher Edu 27:427–441.

Bandura A. 1986. Social Foundations of Thought and Action (Englewood

Bandura A. 1993. Perceived self-efficacy in cognitive development and

functioning. Edu Psychologist 28:117–148.

Barrows HS, Tamblyn RM. 1980. Problem-Based Learning: an Approach to 

Medical Education  (New York, Springer).

Bereiter C, Scardamalia M. 2000. Process and product in problem-based

learning research. In: DH Evenson & CE Hmelo (Eds), Problem-Based 

Learning: a Research Perspective on Learning Interactions  (Mahwah,

NJ, Lawrence Erlbaum Associates).Bland MJ, Altman DG. 1986. Statistical methods for assessing agreement 

between two methods of clinical measurement. Lancet i:307–311.

Boud D. 1995. Enhancing Learning Through Self Assessment  (London,

Kogan Page).

Burnett W, Cavaye G. 1980. Peer assessment by fifth year students of 

surgery. Assess Higher Edu 5:273–278.

Chaves JF, Baker CM, Chaves JA, Fisher ML. 2006. Self, peer and tutor 

assessments of MSN competencies using the PBL-Evaluator. J Nurs Edu

45:25–31.

Cunnington J. 2001. Evolution of student evaluation in the McMaster MD

programme. Pedagogue 10:1–9.

Das M, Mpofu D, Dunn E, Lanphear JH. 1998. Self and tutor evaluations in

problem-based learning tutorials: is there a relationship? Med Edu

32:411–418.

Dochy F, Segers M, Sluijmans D. 1999. The use of self-, peer-, andco-assessment in higher education. Studies in Higher Edu 24:331–350.

Dodds AE, Orsmond RH, Elliott SL. 2001. Assessment in problem-based

learning: The role of the tutor. Annal Acad Med Singapore 30:366–370.

Dolmans DH, Schmidt HG. 1996. The advantages of problem-based

curricula. Postgraduate Med J 72:535–538.

Edwards RK, Kellner KR, Sistrom CL, Magyari EJ. 2003. Medical student self-

assessment of performance on an obstetrics and gynaecology clerkship.

 Am J Obstetrics and Gynaecol 188:1078–1082.

Entwistle NJ, Marton F. 1994. Knowledge objects: Understandings

constituted through intensive academic study. Br J Edu Psychol

64:161–178.

Eva KW. 2001. Assessing tutorial-based assessment. Adv Health Sci Edu

6:243–257.

Eva KW, Regehr G. 2005. Self-assessment in the health professions: A 

reformulation and research agenda. Acad Med 80:S46–S54.Eva KW, Cunnington JPW, Reiter HI, Keane DR, Norman GR. 2004. How 

can I know what I don’t know? Poor self assessment in a well-defined

domain. Adv Health Sci Edu 9:211–224.

Falchikov N, Magin D. 1997. Detecting gender bias in peer marking of 

students’ group process work. Asses Eval Higher Edu 22:385–396.

Fitzgerald JT, White CB, Gruppen LD. 2003. A longtitudinal study of self-

assessment accuracy. Med Edu 37:645–649.

Gordon MJ. 1991. A review of the validity and accuracy of self-assessments

in health professions training. Acad Med 66:762–769.

Guerrero APS. 2001. Mechanistic case diagramming: A tool for problem-

based learning. Acad Med 76:385–389.

Hay PJ, Katsikitis M. 2001. The ‘expert’ in problem-based and case-based

learning: Necessary or not? Med Edu 35:22–28.

Kelson ACM, Distlehorst LH. 2000. Groups in problem-based learning

(PBL): Essential elements in theory, & practice. In: DH Evenson &CE Hmelo (Eds), Problem-Based Learning: a Research Perspective on 

Learning Interactions  (Mahwah, NJ, Lawrence Erlbaum Associates).

Kemmis S, Wilkinson M. 1998. Participatory action research and the study 

of practice. In: B. Atweh, S. Kemmis & P. Weeks (Eds), Action Research 

in Practice: partnerships for Social Justice in Education , pp. 21–36

(London, Routledge).

Lejk M, Wyvill M. 2001. The effect of the inclusion of self-assessment with

peer-assessment of contributions to a group project: a quantitative

study of secret and agreed assessments. Assess Eval Higher Edu

26:551–561.

Magin DJ. 2001. A novel technique for comparing the reliability of multiple

peer assessments with that of a single teacher assessment of group

process work. Asses Eval Higher Edu 26:139–152.

Margetson D. 1994. Current educational reform and the significance of 

problem-based learning. Stud Higher Edu 19:5–19.

Mattheos N, Nattestad A, Falk-Nilsson E, Attstrom R. 2004. The interactive

examination: assessing students’ self-assessment ability. Med Edu

 An analysis of peer, self, and tutor assessment

7/27/2019 assessment tutorial.pdf

http://slidepdf.com/reader/full/assessment-tutorialpdf 11/12

McLoughlin C, Luca J. 2004. An investigation of the motivational aspects of 

peer and self assessment tasks to enhance teamwork outcomes.

Paper presented at the Proceedings of the 21st ASCILITE Conference ,

Perth, 5–8 December.

Miller PJ. 2003. The effect of scoring criteria specificity on peer and self 

assessment. Asses Eval Higher Edu 28:383–394.

Minion DJ, Donnelly MB, Quick RC, Pulito A, Schwartz R. 2002. Are

multiple objective measures of student performance necessary? Am JSurg 183:663–665.

Nendaz MR, Tekian A. 1999. Assessment in problem-based learning

medical schools: a literature review. Teach Learn Med 11:232–243.

Neville AJ. 1998. The tutor in small-group problem-based learning: teacher?

Facilitator? Evaluator? Pedagogue 8:1–9.

Nichols JD, Steffi BE. 1999. An evaluation of success in an alternative

learning programme: motivational impact versus completion rate. Edu

Rev 51:207–219.

Norcini JJ. 2003. The metric of medical education. Peer assessment of 

competence. Med Edu 37:539–543.

Norman G, Schmidt HG. 1992. The psychological basis of problem-based

learning: a review of the evidence. Acad Med 67:557–565.

Orsmond P, Merry S, Reiling K. 2000. The use of student derived

marking criteria in peer- and self-assessment. Asses Eval Higher Edu

25:23–38.Pajares F. 1996. Role of self-efficacy beliefs in the mathematical

problem-solving of gifted students. Contemporary Edu Psychol

21:325–344.

Papinczak T, Young L, Groves M. 2007. Peer-assessment in problem-based

learning: A qualitative study. Adv Health Scie Edu 12:169–186.

Pond K, ul-Haq R. 1997. Learning to assess students using peer review. Stud

Edu Eval 23:331–348.

Pope NK. 2005. The impact of stress in self- and peer-assessment. Asses

Eval Higher Edu 30:51–63.

Pressley M, McCormick CB. 1995. Advanced Educational Psychology for 

Educators, Researchers and Policymakers  (New York, Harper Collins

College Publishers).

Reiter HI, Eva KW, Hatala RM, Norman GR. 2002. Self and peer assessment 

in tutorials: Application of a relative-ranking model. Acad Med

77:1134–1139.Rezler AG. 1989. Self-assessment in problem-based groups. Med Teach

11:151–156.

Rudy DW, Fejfar MC, Griffith CH, Wilson JF. 2001. Self and peer assessment 

in a first-year communication and interviewing course. Eval Health

Profess 24:436–445.

Schwarzer R, Jerusalem M. 1995. Generalized Self-efficacy Scale  (Windsor,

UK, Nfer-Nelson).

Searby M, Ewers T. 1997. An evaluation of the use of peer assessment in

higher education: A case study in the school of music. Asses Eval

Higher Edu 22:371–383.

Segers M, Dochy F. 2001. New assessment forms in problem-based

learning: The value-added of the students’ perspective. Stud Higher Edu

26:327–343.Sluijmans D, Dochy F, Moerkerke G. 1999. Creating a learning

environment by using self-, peer-, and co-assessment. Learn Environ

Res 1:293–319.

Sluijmans DMA, Moerkerke G, van Merrienboer JJG, Dochy FJRC. 2001.

Peer assessment in problem-based learning. Stud Edu Eval 27:153–173.

Stefani LAJ. 1992. Comparison of collaborative self, peer and tutor 

assessment in a biochemistry practical. Biochem Edu 20:148–151.

Sullivan ME, Hitchcock MA, Dunnington GL. 1999. Peer and self assessment 

during problem-based tutorials. Am J Surg 177:266–269.

Swanson DB, Case SM, van der Vleuten CPM. 1997. Strategies for student 

assessment. In: D Boud & G Feletti (Eds), The Challenge of Problem- 

Based Learning , pp. 269–282 (London, Kogan Page).

Tousignant M, DesMarchais JE. 2002. Accuracy of student self-assessment 

ability compared to their own performance in a problem-based learning

medical program: a correlation study. Adv Health Sci Edu 7:19–27. Van Rosendaal GMA, Jennett PA. 1992. Resistance to peer evaluation in an

internal medicine residency. Acad Med 67:63.

 Ward M, Gruppen L, Regehr G. 2002. Measuring self-assessment: current 

state of the art. Adv Health Sci Edu 7:63–80.

 Whitfield CF, Xie SX. 2002. Correlation of problem-based learning

facilitators’ scores with student performance on written exams. Adv

Health Sci Edu Theory and Pract 7:41–51.

 Willis SC, Jones A, Bundy C, Burdett K, Whitehouse CR, O’Neill PA. 2002.

Small-group work and assessment in a PBL curriculum: a qualitative

and quantitative evaluation of student perceptions of the process of 

 working in small groups and its assessment. Med Teacher 24:495–501.

 Woolliscroft JO, Tenhaken J, Smith J, Calhoun JG. 1993. Medical students’

clinical self-assessments: comparisons with external measures of 

performance and the students’ self-assessments of overall performance

and effort. Acad Med 68:285–294.Zimmerman BJ. 2000. Self-efficacy: an essential motive to learn. Contemp

Edu Psychol 25:82–91.

Zimmerman BJ, Bandura A, Martinez-Pons M. 1992. Self-motivation for 

academic attainment: the role of self-efficacy beliefs and personal goal

setting. Am Edu Res J 29:663–676.

 T. Papinczak et al.

7/27/2019 assessment tutorial.pdf

http://slidepdf.com/reader/full/assessment-tutorialpdf 12/12