Clinical competence examination – Improvement of validity and reliability

* Tel.: þ44 (0

E-mail addr

1746-0689/$ - s

doi:10.1016/j.ijo

Available online at www.sciencedirect.com

International Journal of Osteopathic Medicine 11 (2008) 137e141

www.elsevier.com/locate/ijosm

Commentary

Clinical competence examination e Improvement ofvalidity and reliability

Paula Fletcher*

European School of Osteopathy, Boxley House, The Street, Boxley, Maidstone, Kent ME14 3DZ, United Kingdom

Received 1 August 2008; accepted 26 August 2008

Abstract

The traditional approach to final clinical competence assessment has many shortcomings in terms of validity and reliability.Strategies for improving this traditional approach are presented, which include a degree of standardisation, coupled with increasedvariety. The advocacy of standardised or simulated patients by some researchers is discussed with the incorporation of patientfeedback into the competence assessment mix. The relevance of examiner bias and the negative effects of being observed on

candidate performance are considered, together with the significance of examiner training and the manner of their deployment.Consideration is given to alternative assessment modes with a concluding argument in favour of continuous assessment in place ofthe final examination.

� 2008 Elsevier Ltd. All rights reserved.

Keywords: Clinical competence examination; Validity; Reliability; Measurement; Osteopathic medicine; Education

1. Introduction

For the last two decades, the UK providers of osteo-pathic training and their professional, accrediting body,have relied predominantly upon the ‘long case’ to providethe final proof of clinical competence. It has been assumedthat this assessment has all of the five required attributesof any assessment process: reliability, validity, accept-ability, feasibility, and educational impact (see Table 1).This commentary aims to question this reliance and toencourage the move to something more standardised,with greater weight being given to continuous assessment.

Before proceeding to consider the long case in moredepth, it would be useful to define competence. In theirpaper looking at competence and performance ingeneral practitioners, Rethans et al.2 differentiate

) 1622 671558; fax: þ44 (0) 1622 662165.

ess: [email protected]

ee front matter � 2008 Elsevier Ltd. All rights reserved.

sm.2008.08.028

between competence and performance. Competence issaid to consist of knowledge, skills and attitude. Theyconclude that assessment of competence, therefore,requires several measurement instruments, each repre-senting different aspects of competence.

Southgate3 defines clinical competence as ‘‘in partthe ability, in part the will, to select and performconsistently relevant clinical tasks in the context of thesocial environment in order to resolve health problemsof individuals and groups in an efficient, effective,economic and human manner’’. In a summarised form,this is not too dissimilar to the Standard 2000,4 thatpurports to provide the components of clinical compe-tence assessment used by osteopathic training providersin the UK.

2. The long case

The long case has been outlined by Godfrey andHeylings5 as a method of assessment in medicine used

mailto:[email protected]

http://www.elsevier.com/locate/ijosm

Table 1

Five required attributes of an assessment process (adapted from McKinley et al.1)

Reliability An estimate of score variation due to performance differences between subjects and includes agreement

between examiners assessing the same performance. The reliability of a regulatory assessment should generally

be a minimum of 0.8

Validity The extent to which an assessment is a measure of what should be measured. Validity concerns both the

instrument and assessment process and the challenge with which the candidate is tested

Acceptability The extent to which the assessment process is acceptable to the stakeholders. In competence tests of an osteopathic

student, the stakeholders are the student, the examiners/assessors, the patients/simulators, the profession, future

patients of the osteopath and society

Feasibility The extent to which the assessment can be delivered to all those who require it within real costs of staff

and time constraints

Educational

Impact

The extent to which the assessment can assist the osteopathic student to improve performance experientially and by means of

feedback on specific strengths and weaknesses, plus prioritised and specific improvement strategies

138 P. Fletcher / International Journal of Osteopathic Medicine 11 (2008) 137e141

virtually everywhere from undergraduate to post-graduate training. It has the following instantly recog-nisable features:

� The candidate interacts with a patient (new orreturning);� The candidate is then interviewed by an examiner forperhaps 15e20 min when the patient history is pre-sented plus examination findings; differential diag-nosis; and management of the case.� The examiner may then see some of the subsequenttreatment;� Ideally a moderator, or second examiner may see thecandidate; and� There follows a moderation meeting when examinersconfer on an appropriate grade for the candidate.

As Godfrey and Heylings state, ‘‘the long case isgenerally regarded at undergraduate level as moreindicative of potential success or failure as a clinicianthan almost any other part of the final examinations’’.The rationale for this is that the ‘‘long case’’ apparentlyoffers face validity (i.e. appears to be measuring what itpurports to). However, the long case has poor contentvalidity (i.e. unable to differentiate between groups withknown differences).5 This can be improved by increasingthe number of cases per candidate.5 However, againaccording to Godfrey et al., the exam performance canbe adversely affected by a range of other factors, such as:

� Patient variability (possibly even dishonesty);� Examiner variability (bias). Three not uncommonsources of bias are:a) the dove/hawk dimension, where one examiner is

more lenient than another6;b) the tendency for one examiner to ‘‘spread’’ their

marks more widely than another examiner7;c) The ‘‘halo effect’’, or the tendency to rate

a candidate high (or low) in all areas being eval-uated in a session if the candidate scores high (orlow) in one area.

� Serendipity (some candidates may have seen similarcases before, whilst others may not).

� It may well be the situation that the activities ofa candidate in a long case may not be observed bythe examiner(s). In consequence many of the skillssaid to be examined may not be. They could includesuch things as:a) explanations to patients;b) patient examination;c) technical skills.

There is also evidence that direct observation canhave adverse effects upon the observed.8

3. The candidate

It is important at this point to consider the candidate.Certainly the prospect of a ‘‘long case’’ assessmentconcentrates the minds of the students and can be anexample of examination directing learning.5 Neverthe-less, many students tend to try to cover all possibleclinical possibilities rather than concentrating on basicskills such as taking the case history and carrying out thephysical examination. Many of the students claim thatthey do not really understand what is expected of them.5

Clear guidance is therefore essential.Neufeld and Norman,7 question the issue of what is

being measured by oral examinations. To what extent istrue ability being measured in oral examinations and towhat extent are measurements contaminated by unsys-tematic judgements about other characteristics ofstudents? They refer to issues that are suitable for oralassessment: breadth as well as depth; clinical judgement;ability to think on their feet; interpersonal skills.

Clearly all of these features could be perceived asmitigating against the validity and reliability of the longcase as a means of assessing clinical competence. Howcan this situation be improved?

4. Clinical competence examination and how to improve

its validity and reliability

Godfrey and Heylings,5 propose various remedies forimproving the validity and reliability of the long case:

139P. Fletcher / International Journal of Osteopathic Medicine 11 (2008) 137e141

� Standardisation (or simulation) of the patient, e.g.using trained or primed patients. Allen and Rashid9

looking at assessment of consultation skills usingsimulated surgeries, concluded that the simulatedpatient has a number of practical advantages overother assessment methods; ensuring both face andcontent validity. They also felt that an essentialaspect of judging competency was the inclusion ofpatient feedback. In their experience, observers areoften wrong when attempting to judge from outsidewhether the patient’s anxieties have been addressed.The simulated patient here is a real patient trained tosimulate the particular case based on an original.They are also trained in the application of themarking schedules. Such intricacy may well increasecosts too much and, therefore, negate feasibility. Amodified approach is, therefore, advised.� Ensure adequate examiner training. This should becarefully planned and coordinated. Wildman et al.10

have shown that practice sessions, linked withfeedback and discussions about ratings on directobservation, do improve inter-rater reliabilitybetween examiners (see later also).� Observe the encounter and specify criteria for obser-vation and assessment. An interesting observationfrom Neufeld and Norman7 is that the morebehaviour the examiner has to score, the lower thereliability. Clearly, therefore, it is essential to haveassessment criteria that are not overly detailed.� Standardise the questions asked in the post-encounterinterview.� Increase the number of long cases contributing toa decision on a candidate. Godfrey and Heylings5

recommend using two or more long cases per subjecte each with one examiner e rather than one longcase with two co-existing examiners. They claim thatexaminer agreement between cases is much betterthan examiner agreement for the same case. Thisdoes not contradict the earlier point regarding thebeneficial effects of examiner training upon inter-rater reliability. However, in terms of oral exami-nation, many researchers have found that scores ofraters agree more closely on candidates when theraters examine in pairs.6,11 Admittedly, the long caseis not the direct equivalent of an oral examination.Bull11 found that the correlations between marksassigned by pairs of examiners (who were asked totry not to influence each others marks) ranged from0.51 to 0.89. Additionally, Wilson et al.,12 found ina similar study correlations between paired exam-iners for a long case of 0.78, and 0.84 for a shortcase. Godfrey and Heylings5 do not quote experi-mental data and in consequence the present author isinclined to dismiss their assertions on this point.� Keep a record of the long cases being allocated,linking this to the candidate’s course content. Thus, if

the allocated long cases relate to hepatic and respi-ratory problems, covering other issues in other partsof the assessment can increase content validity.� Do not use the long case to test obscure knowledge.Ensure questioning is testing those skills that requirethe presence of a patient.

Shumway and Harden13 make a number of pertinentobservations. In their view, ‘‘to assess a learner’scompetence accurately, the patient and the examinershould be held constant across all the different learnersto be assessed’’. Clearly, in an osteopathy context thiswould be impossible. Shumway and Harden go on todiscuss simulated and standardised patients. A simu-lated patient is an individual who simulates the role ofa real patient. A standardised patient is a patient, anactor or other individual who has undergone training toprovide a realistic and consistent representation ofa patient. The terms are often used interchangeably. Inthe USA and Canada, 80% of medical schools usesimulated or standardised patients in their medicaleducation departments,13 and these are used effectivelyin objective structured clinical examinations (OSCE). Intheir view, the OSCE is a valid and reliable tool forclinical competence assessment. Generally, the greaterthe number of stations at which a student is examinedthe greater is the reliability and content validity ofthe OSCE.

Probert et al.14 found that for a traditional clinicalfinal examination (including a long case), studentperformance was not a good indicator of rating asa junior doctor. However, there was a positive rela-tionship between final year OSCE performance andrating as a junior doctor.

Van der Vleuten and Swanson,15 in a literaturereview, found that with the use of multiple simulated orstandardised patients, very little measurement error wasintroduced when these patients were trained to play thesame patient role. The reliability of simulated orstandardised patients has been shown to be acceptablewhen there is adequate training and standardisation.16

Rethans et al.2 are of the view that assessment ofclinical competence requires several measurementinstruments, each representing different aspects ofcompetence. Quoting amongst others Van der Vleutenand Swanson,15 they state that the use of standardisedpatients in examinations has been shown to be the mostdirect method of assessment, with high reliability andvalidity. Their research suggests that competenceassessment is only a predictor of performance if quali-tative and quantitative data are collected.

Shumway and Harden13 also argue that assessmentshould not be limited to one approach in clinicalcompetence assessment. This can be varied dependingupon the particular skill being assessed. According tothe latter authors, student participation in an OSCE has

140 P. Fletcher / International Journal of Osteopathic Medicine 11 (2008) 137e141

a positive impact on learning. ‘‘The student’s attention isfocused on the acquisition of clinical skills.’’ Theauthors state that the OSCE provides formative evalu-ation as the student is participating in it. However, theydo add that the OSCE can have the unfortunate effect ofcompartmentalising skills and diminishing integration.Van der Vleuten,17 raises similar points. Other lessattractive features of the OSCE include: overall costboth in administration of the process and training staff,significant time investment, and maintenance of examsecurity.13

Smee18 agrees with Godfrey and Heylings,5 that thebasic requirements for reliability and validity have notalways been achieved in the traditional ‘‘long case’’ and‘‘short case’’ assessments. He goes on to add that ‘‘casespecificity means that performance with one patientrelated problem does not reliably predict performancewith subsequent problems’’. For Smee, the logicalextrapolation of this is the OSCE, where performance issampled over a range of patient problems.

Lowry19 raises an interesting point by suggesting thatany assessment of medical students should help themfocus their learning, picking out strengths and weak-nesses, giving the opportunity for improvement, and inthe process protecting the public from incompetence. Todo all of this, the assessment process must containa significant formative element. The summative elementof the assessment must be criterion-referenced.

In terms of undergraduate medical training, Van derVleuten17 is very much in favour of greater emphasis oncontinuous assessment and less on the final examination.He does admit that certain features of the multi-facetedfinal examination do agree in favour of its reliability but,generally speaking, feels that the final exam ‘‘cannotreliably decide a student’s competence in relation to anentire curriculum’’. He argues that the amount ofinformation obtained from a ‘‘continuous and longitu-dinal assessment programme’’ cannot be replaced bya final exam that occurs at one point in time.

In his argument he makes use of Miller’s pyramid ofclinical competence, composed of four tiers, progressingthrough ‘‘knows’’ to ‘‘knows how’’ to ‘‘shows how’’ to‘‘does’’. Any competence assessment must be able tocover the higher levels of the pyramid.

A timely word of caution comes from Hodges,20 whostates that ‘‘what teachers and evaluators choose toemphasise in medical education drives behaviour ofstudents and colleagues to such an extent that it canactually create forms of incompetence’’. He concludesthe following:

� Avoidance of teaching and testing ‘‘pure’’ knowl-edge and general skills. Knowledge and skills need tobe integrated and contextualised from the start.� Limit the use of highly standardised scenarios andmeasures. Forms of thinking characteristic of an

expert should be encouraged with embracing ofvarying situations and cases.

5. Conclusion

The literature in this commentary tends to favoura move away from the traditional, rather non-stand-ardised final clinical competence assessment, to anassessment that is more standardised but that never-theless includes variety and ensures integration ofknowledge and skills. This assessment should be basedon a series of encounters with the student, using trainedexaminers, and by collecting a mix of quantitative andqualitative data. Caution is advised, however, on thedegree of standardisation. The methods employed forthe assessment of competence can be debated, but whatis clear is that validity and reliability are likely to beenhanced by a move away from the traditional approachto clinical competence assessment.

References

1. McKinley RR, Fraser RC, Baker R. Model for directly assessing

and improving clinical competence and performance in revalida-

tion of clinicians. Br Med J 2001;322:712–5.

2. Rethans J-J, Sturnans F, Drop R, Van der Vleuten C, Hobus P.

Does competence of general practitioners predict their perfor-

mance? Comparison between examination setting and actual

practice. Br Med J 1993;303:1377–80.

3. Southgate L. Freedom and discipline: clinical practice and the

assessment of clinical competence. Brit J Gen Prac 1994;44:87092.

4. Standard 2000 (S2K) standard of proficiency. General Osteopathic

Council; March 1999.

5. Godfrey J, Heylings D. Guide to assessment of students’ progress

and achievements. The Medical & Dental Education Network,

Queen Mary & Westfield College, University of London; 1997.

6. Bull GM. Examinations. J Med Edu 1959;34:1154–8.

7. Neufeld VR, Norman GR editors. Assessing clinical competence,

Springer series on medical education, vol. 7. New York:: Springer

Publishing Company; 1986.

8. Donabedian A. Medical care appraisal, part III: issues of method

and technique. A guide to medical care administration, vol. II.

Americal Public Health Administration; 1975.

9. Allen J, Rashid A. What determines competence within a general

practice consultation? Assessment of consultation skills using

simulated surgeries. Brit J Gen Prac 1998;48:1259–62.

10. Wildman BG, Erickson MT, Kent RN. The effects of two training

procedures on observer agreement and variability of behaviour

ratings. Child Dev 1975;46:520–4.

11. Bull GM. An examination of the final examination in Medicine.

Lancet August 1956;368–72.

12. Wilson GM, Harden RMcG, Lever R, Robertson JIS,

MacRitchie J. Examination of clinical examiners. Lancet January

4, 1969;37–40.

13. Shumway JM, Harden RM. AMEE Guide No. 25: the assessment

of learning outcomes for the competent and reflective physician.

Med Teach 2003;25:569–84.

14. Probert CS, Cahill DJ, McCann GL, Ben-Shlomo Y. Tradi-

tional finals and OSCEs in predicting consultant and self-

141P. Fletcher / International Journal of Osteopathic Medicine 11 (2008) 137e141

reported clinical skills of PRHOs: a pilot study. Med Edu

2003;37:597–602.

15. Van der Vleuten CPM, Swanson D. Assessment of clinical skills and

standardised patients: state of the art.TeachLearnMed 1990;2:58–76.

16. Tamblyn RM, Klass DJ, Schnabl GK, Kopelow ML. Sources of

unreliability and bias in standardised-patient rating. Teach Learn

Med 1991;3:74–85.

17. Van der Vleuten CPM. Validity of final examinations in under-

graduate medical training. Br Med J 2000;321:1217–9.

18. Smee S. ABC of learning and teaching in medicine: skill-based

assessment. Br Med J 2003;326:703–6.

19. Lowry S. Assessment of students. Br Med J 1993;305:51–4.

20. Hodges B. Medical education and the maintenance of incompe-

tence. Med Teach 2006;28:690–6.

Documents

Clinical competence examination – Improvement of validity and reliability