journal of assesment

The use of qualitative research criteria for portfolioassessment as an alternative to reliability evaluation:a case studyE Driessen,

1

C van der Vleuten,1

L Schuwirth,1

J van Tartwijk2

& J Vermunt3

AIM Because it deals with qualitative information,portfolio assessment inevitably involves some degreeof subjectivity. The use of stricter assessment criteriaor more structured and prescribed content wouldimprove interrater reliability, but would obliteratethe essence of portfolio assessment in terms of flexi-bility, personal orientation and authenticity. Weresolved this dilemma by using qualitative researchcriteria as opposed to reliability in the evaluation ofportfolio assessment.

METHODOLOGY ⁄RESEARCH DESIGN Five qualit-ative research strategies were used to achieve credi-bility and dependability of assessment: triangulation,prolonged engagement, member checking, audittrail and dependability audit. Mentors read portfoliosat least twice during the year, providing feedback andguidance (prolonged engagement). Their recom-mendation for the end-of-year grade was discussedwith the student (member checking) and submittedto a member of the portfolio committee. Informa-tion from different sources was combined (triangu-lation). Portfolios causing persistent disagreementwere submitted to the full portfolio assessmentcommittee. Quality assurance procedures withexternal auditors were used (dependability audit)and the assessment process was thoroughly docu-mented (audit trail).

RESULTS A total of 233 portfolios were assessed.Students and mentors disagreed on 7 (3%) portfolios

and 9 portfolios were submitted to the full commit-tee. The final decision on 29 (12%) portfolios dif-fered from the mentor’s recommendation.

CONCLUSION We think we have devised an assess-ment procedure that safeguards the characteristics ofportfolio assessment, with credibility and depend-ability of assessment built into the judgement proce-dure. Further support for credibility anddependability might be sought by means of a studyinvolving different assessment committees.

KEYWORDS education, medical, undergradu-ate ⁄*methods; educational measurement ⁄*methods;curriculum ⁄ standards; reproducibility of results;clinical competence ⁄*standards; students, med-ical ⁄*psychology; mentors.

Medical Education 2005; 39: 214–220doi:10.1111/j.1365-2929.2004.02059.x

INTRODUCTION

The use of portfolios as an assessment tool hasgained rapid popularity. As has happened withmany assessment instruments, the term �portfolio�has become a container concept covering a diversityof methods.1,2 At the heart of every portfolio isinformation collected in evidence of the owner’slearning process and ⁄or competence levels. Theevidence is often organised by competencies andmay be supplemented with reflections on educa-tional achievement and personal and professionaldevelopment.3 Portfolios were primarily introducedto assess performance in authentic contexts andencourage learners to reflect on their perform-ance.4 When portfolios are used for summativerather than formative assessment, the psychometric

original research

1Department of Educational Development and Research, MaastrichtUniversity, Maastricht, The Netherlands2ICLON, Leiden University, Leiden, The Netherlands3IVLOS Institute of Education, Utrecht University, Utrecht, TheNetherlands

Correspondence: Erik Driessen, Department of Educational Developmentand Research, PO Box 616, 6200 MD Maastricht, The Netherlands.Tel: 00 31 43 388 5774; Fax: 00 31 43 388 5779; E-mail:[email protected]

� Blackwell Publishing Ltd MEDICAL EDUCATION 2005; 39: 214–220214

qualities must meet stringent requirements, partic-ularly in terms of reliability. What evidence thescant studies on the reliability of portfolios haverevealed is cause for concern, as most studies havereported moderate to low (interrater) reliability.3,5,6

The inevitable conclusion is that extreme caution iswarranted when portfolios are used for summativepurposes.7 A case in point is the study carried outin the state of Vermont on the reliability of theirlarge scale portfolio assessment programme inprimary education.8 In response to the low reliab-ility scores (interrater [Spearman] correlationsbetween 0Æ45 and 0Æ65), the Vermont Departmentof Education restricted the reporting of portfolioscores. The Vermont case attracted substantialattention and led to increased standardisation ofportfolio assessment.

There are several strategies for improving interraterreliability in portfolio assessment:

1 standardisation: for example, by structuring con-tent and restricting the number of admissiblesources of evidence;

2 rater training and the structuring of judgementthrough checklists with strict performance cri-teria, and

3 using large numbers of raters to average out anyrater effects.

The disadvantage of the first 2 strategies is that theyjeopardise validity. Portfolios are valuable largelybecause of the richness of the information theysupply. They enable students to present documenta-tion of their personal, authentic, educationalexperiences and experiences in real practice. Stand-ardising those experiences would inevitably detractfrom the portfolio’s educational value. Training ofraters and shared rater experiences would improveinterrater reliability, but studies on the objectivestructured clinical examination have warned againstunrealistically high expectations of rater training,even with well defined instruments.9 The lesson thatdetailed checklists can easily trivialise assessment hasalso been learned in other assessment domains.10

Increasing the number of raters would be an effectivestrategy, were it not for practical constraints, such asthe time-consuming nature of portfolio judgement.In summary, portfolio assessment appears to becaught between the 2 classic evils of poor reliabilityand poor validity. This begs the question of how toachieve sufficient reliability of qualitative and sub-jective judgements for summative purposes withoutfalling into the trap of �corruption of portfolios fortesting purposes�.11

Common misconceptions pervading this discussionare that subjectivity equals unreliability and thatobjectivity equals reliability. This is not universallytrue: objective examinations may be unreliable (cf. asingle-item, multiple-choice examination) and –more importantly – subjective judgements can bereliable provided an adequate number of differentjudgements are collected and collated.12 In anyformal assessment procedure a fair decision mustoptimally reflect the demonstrated competence. Thisimplies that assessments must be comparable acrosscandidates, with minimisation of bias and error.Psychometrically, this means large sample sizes andstructured (i.e. objective) assessments. In this paperwe will present a qualitative approach to portfolioassessment that can enhance reliability without takingrecourse to large samples and rigid structuring.Before looking at reliability from the perspective ofqualitative research criteria, we will address someanalogies between concepts. Some of the conceptsunderlying internal validity and reliability have pen-dants in credibility (cf. internal validity) and

Overview

What is already known on this subject

Qualitative and holistic approaches to assess-ment need not be disqualified because of theirinherent subjectivity and consequent lack ofreliability.

What this study adds

Credibility and dependability of an assessmentprocedure can be built into the judgementprocess. These in-built procedures includeintermittent feedback cycles, involvement ofrelevant resource persons, including the stu-dent, sequential information gathering untilsaturation of information is reached, andquality assessment procedures.

Suggestions for further research

Further support for this assessment proceduremight be sought by means of a study involvingdifferent assessment committees.

215


dependability (cf. reliability) as used in qualitativeresearch.13 To assess the trustworthiness of qualitativedata, Lincoln and Guba have systematically replacedtraditional criteria by a set of parallel methodologicalcriteria. A central criterion is credibility, which relatesto the truth value within the findings so that they areboth believable and supported by the evidenceprovided.13 A number of methodological strategieshave been suggested to ensure credibility anddependability.14 The following 3 strategies areimportant for reaching credibility: triangulation(combining different information sources); prolongedengagement (sufficient time investment by the resear-cher), and member checking (testing the data with themembers of the group from which they were collec-ted). The strategies for realising dependability – thependant of reliability ) involve establishing an audittrail (i.e. documentation of the assessment process toenable external checks) and carrying out a depend-ability audit (i.e. quality assessment procedures withan external auditor).

These strategies can also be used to achieve credi-bility and dependability in educational assessment.15

For example, Norcini and Shea use the concept ofcredibility as a criterion against which methods ofstandard setting can be evaluated.16 In this approach,standard setters should have the proper qualifications(prolonged engagement), many standard setters shouldbe involved and the judgements of other crediblegroups should be included (triangulation).16 Credi-bility strategies are especially useful for constructingan assessment procedure, while dependability strat-egies can be used to monitor the assessment proce-dure. We conducted a case study among 237 Year 1medical students to explore how the concepts ofcredibility and dependability can be applied toportfolio assessment. We addressed the question as tohow such qualitative research strategies can be usedin a portfolio assessment procedure to ensure reliableand valid judgement.

CONTEXT OF THE STUDY

This case study explored the portfolio assessmentprocedure used in Year 1 of the undergraduatemedical curriculum at Maastricht University, theNetherlands. The structure of the portfolio wasprovided by 4 different roles of a doctor: medicalexpert, scientist, health care worker and person.Global criteria were devised for each role andstudents had to collect evidence demonstrating thatby the end of the year they had met those criteria.The students were mentored by medical school staff.

At the beginning of the academic year the portfoliosystem was introduced and students carried out someportfolio exercises. In the portfolio students had topresent an analysis of their personal strengths andweaknesses in relation to the 4 roles of a doctor.These reflections had to be backed up by evidence,such as feedback from evaluations and tests andcompleted assignments. Students were also requiredto draw up a learning plan for the next period.Halfway through the academic year the studentssubmitted their portfolios to their mentors, who gavefeedback. In a progress meeting student and mentordiscussed the portfolio and the student’s competencedevelopment regarding the 4 roles. It was assumedthat the student would adjust the portfolio inaccordance with the feedback received. At the end ofthe academic year this cycle of submission, feedbackand adjustment was repeated. This portfolio formathas been described in greater detail elsewhere.17

Formative and summative assessment

The purpose of the Year 1 portfolio was primarilyformative. It was intended to promote feedback as partof the assessment programme and help studentsmonitor their competence development and developreflective, planning and remediation skills.

The portfolio also served a summative purpose. Thiswas considered desirable for 2 reasons:

1 because experience has taught that purely for-mative assessment tends to lose momentum andafter some time a new impetus is needed to steerstudent learning into the desired direction,18 and

2 because portfolio assessment offers a uniqueopportunity to identify students who are laggingbehind in professional progress and who showinsufficient ability to reflect, plan and ⁄or takeremedial action. As summative assessment impliesthe possibility of failing and students who fail theportfolio may ultimately face expulsion frommedical school, it will be clear that the portfoliois a high stakes assessment and fair decisions andmaximum prevention of decision errors are ofthe essence.

In summary, the portfolio assessment procedure inthe case study was designed to strike a delicatebalance between formative and summative evaluation,seeking the best possible mix of benefits from bothapproaches.19 We will describe how we combinedformative and summative assessment in a single proce-dure.

original research216


THE PORTFOLIO ASSESSMENTPROCEDURE

All portfolios (n ¼ 237) were judged at the end of theacademic year and given a grade of fail, pass ordistinction. The following, rather global, criteria wereused to assess the quality of the portfolios:

• the quality of the analyses of strengths andweaknesses;

• the quality of the evidence;• the extent to which the evidence reflected the

analyses of strengths and weaknesses;• the clarity and feasibility of the learning objec-

tives, and• the extent to which the learning objectives were

achieved.

These criteria express the following steps in theportfolio cycle: reflect on competence development;sample evidence; link evidence to reflection; formu-late learning objectives, and develop competence.Obviously, such broad criteria necessitate extensiveinput from the judges in the assessment process.

Assessment occurred in all phases of the portfolioprocess:

1 during the compilation of the portfolio in regularmeetings of mentor and student;

2 in the end-of-year meeting when mentor andstudent recommended the final grade, and

3 after submission of the portfolio to the portfolioassessment committee (PAC) for final grading.

In all 3 phases procedures and precautions had to bein place to ensure a credible assessment process.

Compiling the portfolio

Over the course of the year the students discussedtheir progress as documented in the portfolio in atleast 2 sessions with their mentor, who providedconstructive oral and written narrative feedback.

The mentor’s combined role of supervisor andassessor can be a difficult, albeit not impossible, task.A classic example of a situation involving a similar rolecombination is the relationship between supervisorand PhD student. The supervisor has to coach andencourage the student to put his or her best effortsinto the dissertation, while at the same time bothsupervisor and student have to prevent submission of

an unsatisfactory dissertation to an external assess-ment panel. In our case study, this analogy has beenparticularly useful in the training sessions for thementors held at the beginning of the academic year.These sessions were supplemented by intervisionsessions, in which the mentors shared experiencesand information. The purpose of this approach was tosupport the mentors in their difficult double role andbuild a sound foundation for feedback to thestudents. Another advantage of regular feedback isthat it prevents students being disappointed by anunexpected, negative recommendation made by amentor to the portfolio committee.

Recommendation by mentor and student

In their final meeting of the academic year thestudent and the mentor discussed the mentor’s wellmotivated recommendation to the assessment com-mittee concerning the grading of the portfolio. Whenstudent and mentor agreed on the grade, the studentsigned the recommendation. The student did notsign if there was disagreement, which the studentindicated on the assessment form. Subsequently, theportfolio was submitted to the committee.

Portfolio assessment committee

The final step of the assessment procedure com-prised a sequential judgement procedure by theassessment committee. As it is the mentors who havefirst-hand experience with the portfolios, it wasdecided that the assessment committee should becomposed of the 13 Year 1 mentors. The committeemembers did not grade the portfolios of the studentsthey had mentored. Because judging a portfolio istime-consuming, the assessment procedure wasdesigned for maximum efficiency. Judgements of thefull committee were only required if the availableinformation was not unanimous. Figure 1 presentsthe assessment procedure in a flowchart.

The flowchart shows the number of portfoliosremaining to be judged after each step in the decisionprocess. A total of 233 portfolios were submitted tothe assessment committee at the end of the academicyear 2001)02. Firstly, the portfolios on which studentand mentor agreed were rated by a single committeemember, who did not study the portfolio in any greatdetail, but typically scanned the work of the studentand mentor and checked whether all procedures hadbeen followed correctly. Only if the rater had anydoubts was the portfolio examined further. Whenrater, mentor and student agreed on the grading, the

217


recommendation became the final decision. The factthat this parsimonious route proved feasible for themajority of portfolios (n ¼ 202; 85%) may be taken asan indication that the assessment procedure, with itsmentor)student meetings, was satisfactory. If therater did not agree with student and mentor, a secondindependent rater judged the portfolio. If the 2 ratersagreed, their judgement became the final decision. Ifthey disagreed, the portfolio was submitted to the fullcommittee.

Students and mentors disagreed about the grading ofonly 7 (3%) portfolios. These portfolios were judged

by 2 raters independently of one another. When the 2raters agreed, their rating was regarded as the finaldecision. If they disagreed, the portfolio was submit-ted to the full committee.

Only 9 portfolios (4%) were discussed in the fullcommittee meeting. These portfolios were reviewedindividually, with mentors and raters presenting theirarguments. The final decision was based on consen-sus among the committee members, excluding thestudent’s mentor. As the mentor is first and foremosta supervisor, rather than an assessor, the mentor hadno vote in the final decision. In all, 29 (12%) final

original research

Figure 1 Flow chart of the judgement procedure of the portfolio assessment committee.

218


decisions differed from the original recommenda-tion. Nine students failed, 147 received a pass and 81were given a distinction.

A total of 226 portfolios (96%) were gradedwithout being reviewed by the full committee. Theentire procedure was completed in the relativelyshort time of 42 hours (i.e. 11 minutes per portfo-lio), with the committee meeting lasting 1 hour.The participants did not perceive the process asparticularly stressful.

DISCUSSION

This case study demonstrates the feasibility of aqualitative approach to achieve reliable summativejudgement using an inherently complex and non-standardised assessment instrument, which relies onholistic professional judgement. We incorporatedsome procedural safeguards into the assessmentprocess to achieve maximum credibility of the deci-sions.13)15 Essential elements in the assessment pro-cess were:

• feedback cycles, incorporated into the mentoringprocess during the compilation of the portfolioto ensure that the mentor’s final recommenda-tion did not come as a(n) – unpleasant – surpriseto the student; this element relates to thecredibility strategies of prolonged engagementand member checking;

• maintaining a careful balance between the men-tor’s roles of coach and assessor, ensuring thatthe person who knew the student best providedthe most relevant information while at the sametime minimising any damaging effect to thementor)student relationship; this relates to thecredibility strategy of prolonged engagement;

• student involvement in the decision process toensure commitment on the part of the studentand allow the student to communicate a differentpoint of view to that of the mentor; this relates tothe credibility strategy of member checking, and

• a sequential judgement procedure in whichconflicting information necessitated more infor-mation gathering, ensuring efficient use ofresources by reserving efforts to achieve morereliable judgement in cases where this was abso-lutely necessary. As a result, more resources (i.e.mentor time) were available for coaching stu-dents, which is in line with the main purpose ofthe portfolio. This element relates to the credi-bility strategy of triangulation.

Dependability can be reached by establishing anaudit trail and by the use of external auditors. Bothstrategies were used to monitor our assessmentprocedure. The audit trail consisted of comprehen-sive documentation of the different steps of theassessment process: a formal assessment planapproved by the examination board; portfolio andassessment guidelines; overviews of the results perphase, and written assessment forms per student.Quality assurance procedures were set up. Theinternal quality procedure enables a student toappeal to the university Board of Appeal for Exam-inations against the outcome of the assessment. Theexternal quality procedure entails regular audit bythe Dutch organisation for educational auditing andaccreditation. This relates to the dependability strat-egy of audit.

The case study showed that all these elementscontributed to the credibility and dependability ofportfolio assessment. We are convinced that thisassessment process has considerably more credibilityand dependability than procedures aimed at highinterrater reliability, particularly if such proceduresnecessitate standardisation and rigid structuring withconcomitant impairment of validity. In the hypo-thetical case of a legal procedure, we would be able tobuild our defence on the evidence that after a carefulassessment procedure a committee of experts hadreached consensus on the final decision on astudent’s portfolio. A better defence is hard toimagine. Further support for this assessment proce-dure might be sought by means of a study involvingdifferent assessment committees.

Although the mentoring process was resource-intensive, most of the mentors’ time was spent onmentoring and formative feedback and only a minorportion on formal assessment. During an informaldebriefing the mentors indicated that the judgementprocedure had not burdened them disproportion-ately.

To some extent, the portfolio assessment proceduredescribed in this paper resembles the proceduresuggested by Friedman Ben David et al., who pro-posed 2 independent ratings followed by a finalconsensus procedure, culminating in an overalljudgement and agreement between the 2 raters.1

Our procedure is broader and includes informationcollected from the very start of the portfoliocompilation process, the mentor’s and the student’sinput, as well as a final full committee consensusdecision.

219


At the heart of the approach used in our case study isthe recommendation by McCullan et al. in theirreview of portfolio studies, that, for portfolio assess-ment, criteria from qualitative research might bemore appropriate than criteria from quantitativeresearch, like reliability.3 Instead of looking at con-sistency across repeated assessments (a quantitativepsychometric approach), we added information tothe judgement process until saturation of informa-tion was reached (a qualitative approach).20 Thisdoes not mean that psychometric aspects wereignored. In fact, the concept of psychometrics –particularly in relation to sequentially increasing thenumber of examiners – was used, but it was notapplied in a classic test theoretical sense.

Naturally, some arbitrary decisions were unavoida-ble. It is not unlikely that adaptations of theprocedure will be proposed, depending on ourcumulative experiences and evaluations. However,the essence of our argument is that we have tried todemonstrate that reliability, as viewed from thepurely psychometric perspective, is too limited acriterion to be applied to qualitative assessment andthat reliability can be built into an assessmentprocedure by implementing various safeguards andby collecting more information only when necessary.We believe that this represents an important stepforward in our endeavours to incorporate morequalitative and subjective features into competenceassessment.

Contributors: All authors contributed ideas to the paperand commented on all drafts of the paper. ED designedand is co-ordinator of the portfolio programme in Years 1–4of the Maastricht undergraduate curriculum.Acknowledgement: we would like to thank Mereke Gorsirafor her help in improving the English language of thepaper.Funding: none.Conflicts of interest: none.Ethical approval: not applicable.

REFERENCES

1 Friedman Ben David M, Davis MH, Harden RM, HowiePW, Ker J, Pippard MJ. AMEE Medical EducationGuide, 24. Portfolios as a method of student assess-ment. Med Teacher 2001;23 (6):535–51.

2 Webb C, Gray M, Jasper M, McCullan M, Scholes J.Models of portfolios. Med Educ 2002;36:879–98.

3 McCullan M, Endacott R, Gray MA et al. Portfolios andassessment of competence: a review of the literature.J Adv Nurs 2003;41 (3):283–94.

4 Snadden D. Portfolios – attempting to measure theunmeasurable? Med Educ 1999;33:478–9.

5 Baume D, Yorke M. The reliability of assessment byportfolio on a course to develop and accredit teachersin higher education. Studies Higher Educ 2002;27(1):7–25.

6 Pitts J, Coles C, Thomas P. Educational portfolios inthe assessment of general practice trainers: reliabilityof assessors. Med Educ 1999;33:515–20.

7 Roberts C, Newble DI, O’Rourke A. Portfolio-basedassessments in medical education: are they valid andreliable for summative purposes? Med Educ2002;36:899–900.

8 Koretz D, Stecher B, Klein S, McCaffrey D. The Ver-mont Portfolio Assessment Program: findings andimplications. Educational Measurement: Issues Pract1994;13 (3):5–16.

9 van der Vleuten CPM, van Luijk SJ, Ballegooijen AMJ,Swanson DB. Training and experience of medicalexaminers. Med Educ 1989;22:290–6.

10 Norman GR, van der Vleuten CPM, De Graaff E. Pit-falls in the pursuit of objectivity: issues of validity,efficiency and acceptability. Med Educ 1991;25:119–26.

11 Huot B. Beyond the classroom: using portfolios toassess writing. In: Black L, Daiker DA, Sommers J,Stygall G, eds. New Directions in Portfolio Assessment.Reflection, Practice, Critical Theory and Large-scale Scoring.Portsmouth, New Hampshire: Cook Publishers1994;325–33.

12 van der Vleuten CPM, Norman GR, De Graaff E. Pit-falls in the pursuit of objectivity: issues of reliability.Med Educ 1991;25:110–8.

13 Lincoln YS, Guba EA, eds. Naturalistic Inquiry. BeverlyHills, California: Sage 1985.

14 Denzin NK, Lincoln YS, eds. Handbook of QualitativeResearch. 2nd edn. Thousand Oaks, California: Sage2000.

15 Webb CER, Gray M, Jasper M, McMullan M, Scholes J.Evaluating portfolio assessment systems: what are theappropriate criteria? Nurse Educ Today 2003;23:600–9.

16 Norcini JJ, Shea JA. The credibility and comparabilityof standards. Appl Measurement Educ 1997;10 (1):39–59.

17 Driessen EW, van Tartwijk J, Vermunt JD, van derVleuten CPM. Use of portfolios in early undergraduatemedical training. Med Teacher 2003;25 (1):18–23.

18 Driessen EW, van der Vleuten CPM. Matching studentassessment to problem-based learning: lessons fromexperience in a law faculty. Studies Cont Educ 2000;22(2):235–48.

19 Zeichner K, Wray S. The teaching portfolio in USteacher education programmes: what we know andwhat we need to know. Teaching Teacher Educ 2001;17(5):613–21.

20 Strauss AL. Qualitative Analysis for Social Scientists. NewYork: Cambridge University Press 1987.

Received 22 September 2003; editorial comments to authors18 November 2003, 5 March 2004; accepted for publication15 April 2004

original research


220