Examination and student assessment committee

EXAMINATION AND STUDENT ASSESSMENT COMMITTEE

REPORT ON AMBULATORY COMMUNITY EXPERIENCE STUDENT ASSESSMENT FROM

2002-2005

M. Schreiber, M.D.

INTRODUCTION

The Ambulatory Community Experience (ACE) course takes place in year 4. It spans either 4 weeks (pre-

CARMS) or 3 weeks (post-CARMS). It involves students being placed in a variety of ambulatory settings,

either community or hospital-based, one placement per student. Evaluation consists of two modalities:

1. A performance-based evaluation. This consists of a grid, with the five rows consisting of

competencies related to the course objectives, and the columns indicating the level of ability

achieved by the student. This counts for 60% of the course grade.

2. A case write-up. This includes a reflective component. This counts for 40% of the course grade.

Two observations are noteworthy in the appraisal of these evaluation instruments:

a) The grades in this course do not appear on the student transcript that is submitted for the CARMS

process.

b) The performance evaluation form in use beginning in 2005/2006 is significantly different from that

in use from 2002-2005. The new form is similar to the form used by all courses in the clerkship, and

includes a considerably expanded matrix with 14 competencies assessed, five possible levels of

ability for each, and detailed descriptors provided for each level of each competency.

Notwithstanding, the straightforward correspondence between the stated course objectives and the

evaluation form has been lost, and so the ACE course director has requested advice from ESAC with

respect to how to match the competencies assessed by the new evaluation form to the course

objectives. The ESAC report will of necessity be based on the data submitted for academic years

2002-2005.

AN APPRAISAL OF THE ASSESSMENT MODALITIES

1. Performance evaluation form

This is completed by the supervisor at the site where the student completes the rotation. As noted above,

five general competencies are assessed:

i. Clinical problem-solving skills.

ii. P:atient management skills

iii. Health promotion and disease prevention

iv. Professional behaviours

v. Community impact on patient care

These are rated on a 5-point scale from “unsatisfactory” up to “above expectations”. Descriptors are

provided for the highest, middle and lower levels. This is worth 60% of the final grade.

APPENDIX 38 APPENDIX 40

Feasibility

The form is certainly straightforward and should present no difficulty for the supervising clinician to

complete.

Validity

Content validity is clear inasmuch as the form is clearly and explicitly linked to the course objectives. In

this reviewer‟s judgment, the form has excellent face validity.

There is no evidence available on predictive validity. To obtain this would require, as stated by the course

director, a considerable research project to study the clinical competence of graduates and then correlate

scores in residency to score on ACE. This is not likely to happen.

Concurrent validity is addressed by noting a very weak, albeit positive, correlation, with the case write up

assignment. Further data on concurrent validity could be fairly easily obtained by studying the degree to

which scores on the ACE form correlate with scores on the performance evaluation in other fourth year

clerkship courses (medicine, surgery, emergency medicine and anesthesia). I would recommend this be

pursued.

Reliability

It is important to know if raters at various sites are using the form in a similar manner. There is however

little evidence provided about this. The only data suggesting that this may be the case is noted on page 4 of

the report, where it is stated that ratings at community sites are similar to ratings in hospital-based sites. It

would be reassuring to know that average grades are roughly similar at the various sites over a sufficiently

long period of time so that several students would have been evaluated at each site. This should be carried

out to identify “outliers”, i.e., supervisors who mark either very leniently or very harshly.

It might be helpful to provide some explicit guidance to raters as to the likely expected performance level of

most fourth year students: e.g., “Most students should be in the meets expectations category”.

Inter-rater reliability cannot be assessed since only a single rater at each site evaluates the student. In some

cases, students interact with more than one clinician, and it would be feasible in those cases to have each of

the clinicians complete the form, and then have the supervisor “average” the ratings on the different forms

in generating the final evaluation. In situations where there is only one supervisor, this would of course not

be feasible.

There are positive correlations noted between the scores of each of the five competencies and the overall

scores. This is of course not surprising since there are only five contributing elements to the final score, and

one would expect scores on this small number of elements to correlate with the final overall score.

Grades achieved

The document does not indicate how the checkmarks on the performance evaluation form are converted into

grades.

Over the three years sampled, the average grades have been very stable, with a mean grade of close to 84%,

and a standard deviation close to 6%. This generates a proportion of honours ranging from 75 to 80%. This

is in the same range as is typically seen for performance evaluation in other clerkship courses, and likely is

attributable to a leniency bias of raters. Assuming that an honours grade reflects a predominance of “above

expectations” ratings, then either the students are exceptionally good or expectations are somewhat low.

Since these grades are not part of the CARMS form, this grade inflation is likely not of major consequence.

Feedback

The procedures in place to provide feedback to the student about her/his performance seem appropriate,

since there is a structure in place for both ongoing regular feedback as well as more formal feedback

midway through the rotation and at its conclusion.

2. Case write-up

A single case write-up is submitted at the end of the rotation. This is worth 40% of the grade. The write-up

is up to 8 pages long, double-spaced, with very clear expectations as to structure, outlined in appendix 8.

Feasibility

The write-up is graded by members of the ACE course committee. It seems to be a reasonable task for these

individuals to be completing.

Validity

The face validity of this exercise in my judgement is very high.

The content validity is supported by the close connection between the course objectives and the components

of the case write-up assessed on the evaluation form. Each of the five competencies is explicitly evaluated.

Examples of issues to be addressed are provided to the students as outlined in appendix 8 and presumably

the same examples and guidelines are available to the graders.

Concurrent validity is indicated by the weak albeit positive correlations with the performance evaluation

scores in the ACE rotation. It would be feasible to search for correlations between the scores on this

assignment and scores on similar exercises, including the DOCH-4 assignment, the reflective write-ups used

in the year 4 medicine rotation, and perhaps case write-up exercises in other clerkships.

Predictive validity is likely not feasible for the same reasons cited above in the appraisal of the performance

evaluation form.

Reliability

Each case write-up is marked by one marker. Accordingly, inter-rater reliability cannot be determined. It is

appropriate that a small number of markers is used, since they can be trained to grade the write-ups in a

consistent manner. It would be useful, however, to verify that on average each of the markers assigns

comparable grades. This should be feasible. Also, it would be reassuring from time to time to check that

markers are working consistently by having a small number of papers marked by each marker. This would

identify hawks and doves. More explicit guidelines on what markers are to look for in each of the domains

would be appropriate. Specifically, the form should indicate what constitutes “insight” into the case, and

what constitutes a “thoughtful” analysis.

Data on internal consistency is not provided. An internal consistency score could be calculated easily

enough if the data have been captured as to exactly what score each student achieved on each competency.

Presumably, one would expect students who score well on one aspect of the write-up to do well on other

aspects.

Actual grades achieved

Interestingly, scores on the case write-ups are modestly but definitely and consistently lower than on the

performance evaluation. The mean scores ranged between 79 and 80%. Equally interestingly, the spread of

scores is much wider, with the standard deviation between 11 and 12%. This means that, assuming a

normal distribution, around 16% of students would score below 70%, which is quite a significant number of

students scoring at a fairly low level.

3. Grades as a whole

In the course as a whole, mean grades were close to 82% each year, with standard deviation of 6% and a

proportion of honours ranging from 64 to 70%.

This is not too different from several other clerkship courses. The same comments apply here as were made

in relation to the performance evaluation form: this seems to be a quite high proportion of students to be

designated at the honours level.

RECOMMENDATIONS

1. In order to study the concurrent validity of the performance evaluation form and of the case

write-up, it would be appropriate to correlate scores on these with scores from comparable

assessments in other clerkship rotations.

2. In order to reassure ESAC that raters are using the performance form in a reasonably

consistent manner across sites, it would be appropriate to provide data on how scores have

averaged across these sites over the years.

3. At sites where multiple clinicians are interacting with the student, then each clinician should

complete the evaluation form and the supervisor then can average the ratings.

4. There should be more direction given to raters on the expected proportion of students

achieving at each level on the performance evaluation form.

5. A sample of case write-ups should be marked by all raters to ensure they are each marking at

a reasonably similar level of expectations.

6. Consideration should be given to a second written report to be handed in during the first half

of the rotation, or alternatively an oral presentation. If the course director finds this proposal

helpful, then resources should be made available to support the marking of a second written

report.

7. Provide mid-rotation feedback to students on their performance in the rotation up to that

point, as is done in other clerkships, so that there is time to demonstrate improvement.

8. In response to the course director‟s question about linking the new evaluation form to the

course objectives, I would suggest the following:

All of the ACE objectives relate to the scholar (self-directed learning) role. The other objectives might map

to the new competencies as follows:

Clinical problem-solving skills

This relates most closely to the first four items in the medical expert/skilled clinician domain (history-

taking, physical examination, diagnostic test interpretation, and the problem formulation). The

communicator role is also relevant.

Patient management skills

This relates most closely to competencies in the medical expert/skilled clinician cluster (problem

formulation and management plan; use of evidence-based medicine), and to the three competencies in the

communicator/doctor-patient relationship cluster (communication with patients/families/community; written

records; patient education).

Health promotion and disease prevention

This relates most closely to the health advocate cluster (recognition of important determinants of health and

principles of disease prevention; patient advocacy).

Professional behaviours

This is relevant to all the competencies, and is also captured on the professionalism form. The collaborator

role is particularly relevant here.

Community impact on patient care

This is most relevant to the manager role (awareness of and appropriate use of healthcare resources) and to a

degree the collaborator role (team participation, provision of patient care in collaboration with all health

care providers).

Respectfully submitted,

Martin Schreiber, M.D.

ESAC Committee Course Review

Anesthesiology (ANS400Y) December 2, 2008

Course Director: Isabella Devito Lead Reviewer: Richard Pittini Student Reviewer: Nicolae Petrescu Course Summary: Anesthesiology is a two week clinical rotation in the fourth year of the medical curriculum. The rotation consists of 8-9 days of clinical placement with one to one faculty supervision. There is one day of simulation based teaching per rotation which typically occurs during the first week of the rotation. Students are assigned to between 4-6 faculty for their clinical experience and are supervised primarily by a respiratory therapist and anaesthesiology residents during their simulation day. The course objectives are reviewed by the course director and the evaluation methods map closely to the objectives. Student performance is evaluated using two separate evaluations; a written examination worth 60% and a clinical evaluation worth (40%). Students receive formal feedback at the midpoint and informal feedback following individual clinical encounters. Overall students perform well in the course with a class average of between 77 and 80 % over the last three academic years. The proportion of students who fail the course is 0 to 0.5% while the proportion who receive honours (>80%) is between 29 and 53 %. Only a total of 20 students received borderline grades (60-69%) over the last three years. Evaluation Components: Written Examination: The written examination is a ten question short answer examination that consists of 40% new questions per iteration. The questions are created or selected from a secure pool and reviewed by the course director and one site coordinator prior to inclusion. The number of subsections per questions varies but the overall quantity of information required to answer each question is uniform. The questions are selected to cover all content areas of the curriculum. Questions are also reviewed post hoc and are revised if <60% or >90% of students answer them correctly. The examination is administered centrally and is computer based. Students are allowed to move back and forth between questions. Two examiners are responsible for grading each written examination. The examinations are divided for marking such that one examiner marks half of the students for question 1 and the other marks the other half of the students for the same question. They then alternate questions such that students have the benefit of two markers for the entire examination. None of the questions are graded by more than one faculty, and no single question is marked entirely by one faculty. The benefit of this design according to the course director is to facilitate discussion between the two faculty as to what answers are acceptable. If additional answers are accepted, preceding papers are remarked according to the revised marking scheme.

Students perform relatively well on the written examination with average scores of 76-81% over the last three years. The marks are normally distributed with a reasonable proportion of students scoring in the borderline category. The marks are consistent across academy sites suggesting they have construct validity however there is a greater variation between rotations suggesting that it is possible that not all examinations are equally challenging. The variation in scores between blocks is less than 2 standard deviations and is not likely of significance. Clinical Evaluation: During each day of the rotation students are assigned to a faculty person. The students provide the faculty with a clinical encounter card to complete which is submitted in a drop-box. This card evaluates 11 criteria. The rating scale used is a five point Likert scale with behaviour anchored ratings. The scale is weighted as follows: (Unsat.=0%, Below exp=65%, Meets=75%, exceeds=80%, outstanding=90%). The weighting of each criterion is determined by the course director and reflects the curricular content. No criterion is worth more than 15%. Individual encounter cards are reviewed by site coordinators who then transform these evaluations into a mark for clinical evaluation. Site coordinators are given the latitude to decide whether to include marks from „hawks‟ or „doves‟ if they are out of keeping with the remainder of the evaluations. The class average on the clinical evaluation has been stable for the last three years at 77-78% with very little variation between academies or between blocks. The standard deviation was as low as 1.7 – 1.9 for one academy over the last three years. No students received an unsatisfactory rating on any criteria at any site for the last academic year. There is a clear pattern of marks with one site having the highest mark for 10 of 11 criteria and another site having the lowest for 8 or 11 criteria. The number of students assigned to each of these two sites is small. Feedback Students have direct observation by faculty throughout their rotation and are engaged in discussion on a regular basis. They received feedback informally on a daily basis. There is a formal written feedback at the midpoint. A new form has been introduced this year to facilitate this feedback and includes not only areas for improvement but also an action plan. There is no formal feedback at the end of the rotation although students are able to „disagree‟ with the clinical evaluation and may subsequently discuss the evaluation with either the site coordinator or course director. Students provide feedback to the course director regarding course evaluations and in the past some students have indicated a preference for fewer faculty observers. Observations: Written

1. The proportion of new questions is high (40%) and may lead to larger fluctuations in the written examination scores

2. The distribution of marks is broad suggesting that the current evaluation methodology captures all levels of competence adequately including those in the borderline category

Clinical

1. Direct observation by multiple faculty facilitates accurate evaluations 2. The issue of having „too many‟ faculty supervisor may be more pertinent to those students seeking a

letter of recommendation 3. The process for integrating clinical evaluations is not consistent across the 4-6 individuals

contributing to a student‟s mark 4. The standard deviation for clinical evaluations appears to be low and this may be the result of

under-utilization of the lower end of the rating scale

5. The weighting of the unsatisfactory rating is too low (no effect apparent due to the infrequency of

its use) 6. 10-15% of the rotation consists of a Simulation component but this component is not evaluated

Feedback

7. The feedback at the end of the course is limited to the provision of component marks and is of limited value to students

Recommendations:

1. Introduce no more than 15% new questions per examination, ensure your question databank is sufficiently large to sustain this approach

2. Divide questions such that one marker grades all students on a given question. The markers can consult each other regarding whether the answer key should be adjusted but this does not necessitate the current method of dividing the examination questions

3. Develop a consistent approach for how faculty input is integrated by site coordinators and disseminate this approach via faculty development

a. Consider adjusting marks rather than omitting them if the patterns are consistent, previous years data could be used to accomplish this objectively

b. Ensure that all evaluators use the same approach to compensating for inexperience i.e. all more lenient at the beginning or all call it like it is but the site coordinator takes into account the date on the encounter card.

4. Revise the clinical encounter cards to include a not applicable column and a checkbox for “this evaluation was discussed with the student”

5. Adjust the clinical evaluation scale such that unsatisfactory is weighted 55% and encourage faculty to utilize the full range of the scale as appropriate

6. Consider developing a method for evaluation of the skills demonstrated by students during the Simulation day, recommend a weighting to such a component of no more than 15%

7. Provide students with a breakdown of the areas that they did not perform well on during the written examination

Conclusion of Review

Based on the CRICES report presented and the opinion of the ESAC committee members, it is our conclusion that: No major issues, ongoing improvements encouraged ………………….. Full review in three years time ___________________________________ Richard Pittini, ESAC Chair ___________________________________ ___________________________ Isabella Devito, Course Director Date

Examination and Student Assessment Committee

Arts and Science of Clinical Medicine II

The course director, Dr. Jacqueline James, presented a comprehensive review of the Arts and Science of

Clinical Medicine II (ASCM II) on November 4, 2008.

Lead Reviewer: Dr. Dara Maker

Components of the Student Assessment System:

There are four components that together account for the student‟s final mark. They include a midyear

observed history and physical exam, oral presentations, written assignments and a final OSCE. Students are

also required to complete a Log Book and are evaluated on their professionalism; however, these are not

included in the final mark calculation.

1. Final OSCE - 50%

The final exam consists of a ten station summative OSCE that covers all major clinical subspecialty

areas covered in ASCM II. The exam is created by the Curriculum Subcommittee of the ASCM II

committee, and the bank of questions (approximately 32) has undergone substantial revision in the past

7 years, including updated scripts and checklists. In the previous academic year, the class average was

78.43% with 40.3% of the students receiving honours.

2. Midyear Observed History and Physical Examination – 20%

The midyear evaluation is both a summative and formative assessment. Students perform a focused

history and 3 physical exam maneuvers and are rated using Likert scales to evaluate process and content.

Two global ratings are also used. In the past academic year the class average was 82.59% with 76.7% of

the students receiving honours.

3. Oral Presentations – 15%

Two oral presentations are given, each worth 7.5%. One is evaluated by the core tutor and the other by

the pediatrics tutor. Students are graded on five components of the presentation based on global ratings

out of 5.

4. Written Assignments – 15%

Two case reports are completed, each worth 7.5%. One full case report is marked by the geriatrics tutor

and is calculated based on five criteria. The psychiatry tutor marks a mental status examination write-up

that is based on three criteria.

The class mean for the composite scores for all in-course assignments (two oral presentations and two

written assignments) in 2007-2008 was 83.11% with 88.8% of the students receiving honours

The overall class average in 2007-2008 was 80.66%, with 73.1% of the students receiving an honours in

the course. The class average has been stable over the past three academic years and is consistent with

class averages from other courses. The 2007-8 range of marks was narrow (70.2-86.0%) and the

standard deviation was small (2.42). This has also remained stable over the past three years.

Areas of Strength:

The ASCM II course demonstrates a number of strengths of the student assessment system. A significant

asset of ASCM II is that it uses a number of different evaluation methods to allow students to be examined

across a variety of domains.

The course is noted for its constant development of its evaluation system via ongoing revisions and

responsiveness to feedback. The opinions of both students and consulting feedback faculty considered

when making changes to the course. The addition of the non-evaluated observed history and physical in

2008-9 to provide students with specific feedback in preparation for their mid-year exam is an excellent

example of the continuous improvements being made to the course.

ASCM II is also noted for its commitment to feedback. Students receive considerable formative feedback

throughout the course via their in-course assignments and exams. Additional methods for ensuring

feedback include the mandatory completion of the skills log-book. Although not calculated for marks, it

ensures regular observation of physical exam maneuvers and continual feedback.

1. Final OSCE:

The ASCM II OSCE includes a large bank of questions (32) that is continuously updated and revised

by numerous methods. At the time of the OSCE, feedback is solicited from examiners and standardized

patients, which is then used to improve upon the station in the future. Past student performance and

comments from experts in the field are also reviewed prior to re-using stations. Lastly, the course

director reviews all checklists to ensure the relevance of the physical exam maneuvers tested.

The exam reflects all major specialty areas taught. Committee members who are experts in the field

generally create new OSCE stations. The means and standard deviations for each station are analyzed

and compared to determine the validity and reliability of the station. Specific criteria for removal of

poorly done items are used.

ESAC was impressed with the reliability coefficients for the total score on the ten station ASCM II

OSCE in 2007-08 which were 0.64 (testing day 1) and 0.65 (testing day 2). The reliability coefficients

were slightly higher in 2007-8 than in previous years, however all the scores within the past three years

have been above 0.5. In 2007-08 there was no difference in the reliability between testing day one or

two. The reliability coefficients for the checklist scores and global rating scores were similarly high and

the correlation coefficients between those two scores was 0.76 and 0.73 for testing days one and two

respectively, indicating that they are both marking the same domains. There was also no difference in

mean scores across the three academies.

2. Midyear Observed History and Physical Examination

The midyear observed history and physical exam is performed on real patients, therefore test questions

are always “novel”. Detailed descriptors have been developed to help examiners evaluate students more

objectively. The observed history and physical exam provides students with considerable feedback on

their clinical skills at the mid-year point, allowing them ample opportunity to improve upon their skills

prior to the final OSCE. Students receive immediate verbal feedback on their performance as well as a

copy of their evaluation form including written strengths and weaknesses. Students appreciate the one

on one time they receive with their core tutor who is responsible for administering and evaluating the

exam. Students scoring 73% or below (global rating 3/5) are invited to meet with the course director to

review their performance.

3. Oral and Written Presentations

Both oral presentations and written reports are important communication skills in medicine and

consequently are prudent to evaluate. Students are assessed by four different tutors, giving students the

benefit of multiple observer assessments and minimizing assessment bias. The advantage of two of

each type of assignment is that students have the opportunity to incorporate the feedback they received

from their first written or oral assignment to enhance their performance prior to the second.

Areas for Improvement and Recommendations:

1. Final OSCE

Concerns were raised regarding the criteria set for failing the OSCE examination. Students must pass 6

of 10 stations and achieve a total score of 60% or greater. ESAC was concerned with the low standards

required for passing, given that ASCM II builds much of the foundation for clerkship. The numeric

scale used to equate a mark from the 5-point Likert scale creates a very narrow range of marks. The

lowest possible score is 55% and maximum 91%. Given that Likert ratings of 1/5 or 2/5 are equated to

55% and 64% respectively, it is difficult for students to fail a station (achieve <60% overall).

Additionally, students can be designated as failing a station via the examiner‟s impression, but may still

achieve >60% and thereby pass the station. Overall, ESAC questioned whether stricter criteria for

passing this important exam should be developed.

Recommendations: Change the numerical weighting of the Likert scale to ensure students are

appropriately identified if performing below standard. Specifically, widen the lower end of the scale.

Adjust the requirements needed to pass the OSCE to ensure that weaker students are identified and

provided with the opportunity to complete extra work or remediation prior to commencing clerkship.

We recommend that‟s students must not have greater than two failed stations in order to pass the OSCE.

2. Midyear Observed History and Physical Examination

There is a 7-week variability in the timing of the midyear observed history and physical exam, which

may affect student performance. There are also situations when the exam is performed on standardized

patients rather than real patients and this may affect validity.

ESAC‟s major concern with the midyear exam was that it is conducted by the student‟s own core tutor.

This may introduce bias as it is being graded by core tutors whom the students have interacted with

extensively before the evaluation. Examination by the students‟ core tutors may also lead to mark

inflation (mean test score 82.49, 76.7% achieving honours). In addition, there is little correlation

between the midyear exam and the final OSCE (r=0.05 in 2007-08). Although the OSCE and observed

history and physical assess different qualities it is possible that the feedback received from the midyear

test may not be useful in preparation for the OSCE. ESAC was also concerned that students are

receiving most of their formative feedback from their core tutor (one individual) and may be observed

for the first time by others during their OSCE exam.

Recommendations: Switch examiners for the midyear observed history and physical examination such

that the examiner is not the student‟s core tutor. Although ESAC recognizes that this may influence the

students‟ perceived quality of the feedback they receive it will likely improve the examination‟s

objectivity.

Further standardization of the observed history and physical exam may be require such that all students

interact with real patients and all exams are conducted within a narrow window of time.

ESAC recognizes that given the complexity of the ASCM II scheduling considerable work may be needed

to implement these changes.

3. Transparency of the Evaluation System to Students

ESAC was concerned with the lack of transparency in the assessment system. It is not clear to ESAC

how borderline students are dealt with nor is it clear to the students – i.e. via explicit description in the

course handbook.

Recommendations: Improve transparency to students in how marks are determined and how students

performing below standard will be dealt with. Clearly describe the management of borderline and

failing students in the written material (i.e. course manual) provided to them.

4. Timely Feedback

There is variable timeliness in the provision of feedback. Students receive regular formative feedback

throughout the course but as grades are calculated centrally via MedSIS, the availability of final marks

for assignments/exams is unfortunately slow. This translates to a sense by some students that they don‟t

know how they are performing in ASCM until late in the year.

ESAC was also concerned that the OSCE summary feedback form was not being sent to students until

many months after the completion of the course. The delayed receipt of this feedback decreases its

impact and effectiveness. It was also noted that students who took the exam in 2006-07 did not receive

any written feedback. This was confirmed by the course director who explained that extenuating

circumstances led to the oversight.

Recommendations: Timely, effective feedback be provided to the students. ASCM II should work with

MedSIS and faculty to develop a system for more expedient mark calculation. Written feedback from the

ASCM II OSCE should be made available to students in a timely manner (preferably within one month

of the examination).


Based on the CRICES report presented and the opinion of the ESAC committee members, it is our

conclusion that:

Continued improvement is encouraged . . . . . . Full review next cycle (approx. 3 years)

___________________________

Richard Pittini, ESAC Chair

______3/23/2009_____________

Date

cc: Preclerkship Coordinator, Vice-Dean, UME

Faculty of Medicine University of Toronto

Undergraduate Medical Education

Examination and Student Assessment Committee (ESAC) Course Review: Brain and Behaviour, BRB 111S

Course Co-Directors: P. Stewart and M. Hohol

Reviewers: R. Pittini and H. Bielawska

This CRICES report was presented to the ESAC committee by Drs. Stewart and Hohol on December 6th

,

2005.

This first year course consists of eight weeks of lectures, PBL sessions, seminars and laboratory sessions

organized into four blocks. Topics covered in these blocks include neuroanatomy, cell biology, motor

systems, sensory systems, higher cognitive functions and behaviour.

Evaluation is this course consists of two examinations; a mid-term exam worth 40% and a final examination

worth 60%. The midterm examination consists of a practical 55 question „bell-ringer‟ component and 50

multiple choice questions, both components are equally weighted. The final examination consists of 60

multiple choice questions and 10-13 short answer questions. The majority of the short answer questions

address problem based learning process rather than content. Different content is covered by each of the

examinations and while early concepts are built upon in the later portion of the course the material covered

on the mid-term examination is not re-examined on the final examination.

Students are evaluated by tutors and receive feedback both at the midterm and end of the course. This

evaluation is not weighted in the course mark. Answers to the multiple choice exam, bell ringer and short

answer questions are provided on the day of the examinations as a source of additional feedback to students.

Students tend to score highly on the course evaluations with the class average being 77, 83, and 82% over

the last three academic years. Very few failures occur (2, 0.5, 0%) and a significant proportion of students

achieve honours (36, 69, 62%). While the evaluations are not cumulative there is good correlation between

the midterm and final evaluation. There is also a strong correlation with other first year courses. Student

grades in BRB correlate well with grades in subsequent years but less so in the third.

Multiple Choice Questions

There are over one hundred multiple choice questions used each year with most being new contributions.

Examination questions are contributed by lecturers with the majority having been lecturers in this course for

several years. Efforts are made to ensure that new questions are linked to the course objectives. Questions

are not secure as they are provided along with the answers to students following the examination as a means

of immediate feedback. Post-hoc analysis is carried out by the course co-directors but questions are seldom

excluded. Students infrequently request changes. The midterm and final multiple choice question scores

correlate moderately although they tend to be higher on the final examination. The internal consistency of

the examinations is good. The multiple choice question scores do not correlate well with the short answer

questions (.28) but this may be due to the emphasis on process over content in the majority of short answer

questions.

Bell Ringer Examination

This 55 station examination evaluates neuroanatomy and associated functions. The examination involves

students viewing a specimen or image and identifying structures or answering brief questions about

function. The examination is timed with 1.5 minutes per station. Tutors design the questions and review

them as a group to ensure that the content is appropriate. The marking scheme is determined by an

examination committee. The written answers are marked by tutors with all answers on a given station being

marked by one tutor. Six tutors are responsible for evaluating the entire examination. Students perform

well on this component with a class average of 77, 83, and 76%. More students fail this component of the

course than then the multiple choice component with 5.5, 0.5, and 4% failing in the last three years. Marks

on this component correlate moderately with the multiple choice question scores (0.6-0.7).

Short Answer Questions

There are ten to twelve short answer questions on the final examination. Approximately 75% of these

questions aim to evaluate problem solving skills/process with the other 25% addressing content. The

number of questions is limited by the time available for students to write and by the logistics of marking the

questions. Students feel that there should be increased emphasis of short answer questions. The issue of

subjectivity in marking SA questions is addressed by having one marker grade all answers or if more than

one marker is required by dividing the questions rather than students between the markers. The average

mark on the Short Answer component ranged from 70-83% over the last three years, with between 14%-

71% of students obtaining honours. These wide ranges are felt to result from one specific year‟s exam.

This illustrates the potential impact of exam question selection. Correlation coefficients for SA and MCQ

are low to moderate (0.28-0.59) but this is likely due to the inclusion of both process and content type

questions on the SA exam. Based on limited data, there appears to be only moderate correlation between

SA questions and total grades in Year II and III (0.44-0.34).

Feedback and Remediation

Students are provided with the exam questions and the answers shortly after they complete the exam. This

is possible as the examinations are not secure and serves as the principle means of feedback to students.

Informal feedback also occurs in the context of the PBL sessions but is not formally recorded. Students

seldom contest their marks. Borderline and failing students are identified if they fall below two standard

deviations. The students are interviewed by one of the course directors and individualized remediation

plans are implement utilizing course tutors if required. Student feedback is collected and has been effective

in altering the evaluation system with a reduction in the number of MCQs. Questions on the Bell Ringer are

now such that correct answers on one question are no longer required to answer subsequent questions.

Strengths:

1. New exam questions are created for each iteration and are linked to learning objectives

2. Students receive the examination answers as a timely source of feedback

3. Students are evaluated by a variety of different methods and these methods are appropriately

matched to the type of material being examined

4. „Process‟ questions are included in addition to „content‟ questions, reinforcing the pedagogic

principle of adult learning that „how‟ is as important as „what‟

5. The emphasis on „process‟ in the PBL sessions reduces student concerns over consistency in

„content‟ between tutors

Areas for Improvement & Recommendations:

1. Increase the number of Short Answer questions, specifically those addressing „process‟

Proceed with plans to introduce SA questions and a PBL case into the midterm examination.

Collect data to allow for a correlation between SA „content‟ questions and MCQs, SA „process‟

questions and third year marks (preferably ward sub-scores)

2. Student performance in the PBL sessions is not assigned a mark despite close observation by tutors;

feedback regarding these sessions is not formalized

Faculty development sessions for tutors to aid in the systematic identification of borderline

students, consider simplified non-weighted evaluation of PBL performance e.g. satisfactory or

borderline. Develop a clear protocol for how this information would be fed forward to the

course directors in a timely fashion

Actions:

1. Review and disseminate the National Board of Medical Examiners (NBME) guidebook on exam

question writing *

2. Meet with MEDSIS (Knowledge4you) to share your requirements regarding the electronic transfer

of raw scores into spreadsheets *

* items to be facilitated by ESAC


Based on the CRICES report presented and the opinion of the ESAC committee members, it is our conclusion that:

Continued improvement be encouraged .......................Full review next cycle (approx. 3 years)

___________________________


____April 18, 2006____________

Date



Examination and Student Assessment Committee (ESAC) Dermatology Review: Vince Bertucci, and Maha Haroun, Course Directors

Review by: Dr. R. Pittini & Dr. S. Bernstein

Dermatology (DRM 400Y) is a one week course that takes place in year four of the MD curriculum as part

of the ambulatory and community medicine block. Students attend three half-day seminars and four half-

day clinics. Students spend time with different faculty during their clinics. There is an introductory central

session. The examination occurs on the last day of the week.

Components

The students take a pre-test on the first day of the rotation. This is not included in their evaluation and is

meant to serve as a baseline for their own reference. The course directors report that they typically observe

a 20% improvement between the pre-test and final exam.

Seminar evaluations are weighted 15% and consist of faculty assessment of participation in each of the three

seminars (5% each). The committee is unaware of the format of this evaluation and the criteria used.

Student performance during each of four clinics is evaluated by the faculty and is weighted 10% per clinic

for a total of 40%. The clinics consist of three hours of interaction with faculty. Standard forms „minicards‟

based on the CanMeds roles are used to complete this evaluation. The ambulatory clinic marks tend to be

high with an average of between 83-92%. 89-100% of student receive honours with no failures. The

weighting of this component has recently been increased from 20 to 40%. Previously these marks were

assigned to an oral examination which consisted of a 30 minute case presentation. Student were asked to

present their history and physical and were then asked non-standardized questions that were both case-based

and generic. This evaluation is no longer being used.

The written examination is a case-based MCQ and uses clearly displayed images with one minute per

question. Students complete the 36-item examination according to the pace of the image presentation. They

are unable to return to previous images. The examination is weighted 45% currently (previously 50%).

Each year 50% of questions are new. There has been an intentional trend towards including easier questions

in response to previous years relatively low marks (Average of 69.0-71.3). The internal consistency of the

MCQ is reasonable with a typical α-coefficient of 0.46. A relatively high proportion of students fail (7-

15%) this component of the course but they do not receive remediation as they do well on the remainder of

the course and it is felt that the examination is difficult. Only one student has failed the course overall in the

three years presented.

Feedback regarding performance is limited because of the short duration of the course. The pre-test is

designed to demonstrate the level of difficulty on the final examination that the students can expect. The

students do not receive their grades on this examination. The correct answers are not provided although the

material covered is included in the syllabus. Feedback regarding seminars and clinics is provided at the end

of each session but this is informal and may be in written or verbal format.

Areas of Strength

1. Use of multiple formats maps well to the content covered during this one week course

2. Interaction with faculty in the clinic and in the small group seminar setting provide an opportunity for

direct observation of students by evaluators

3. MCQ provide an opportunity for objective evaluation of students core knowledge

Areas for Improvement and Recommendations

1. Pre-test utility

a. Students do not optimally benefit from partaking in this exam given its current timing and the

lack of feedback they receive.

Consider moving the exam to later in the week and provide students with the their marks as

well as the correct answers, also consider providing benchmarks which may be helpful in

motivating students to read (consider providing the syllabus at the beginning of the year)

2. Ambulatory Clinics & Seminar Evaluations

a. Performance evaluations lack objective criteria

Develop specific observable criteria for faculty to use when evaluating students in seminars

and during clinics. Consider having evaluations integrated into a final seminar mark by

using a template applied to the session evaluations. If a faculty evaluates more than one

session with a student the weighting of their evaluation should be greater (e.g. 1 session has

a weighting of 1, 2 sessions a weighting of 2 etc.). Fewer faculty observing the same students

on more occasions is ideal. Conduct faculty development directed at encouraging faculty to

evaluate students according to the demonstrated skills rather than participation.

3. Multiple Choice Examination

a. Level of difficulty appears to be too high

Too many new questions included. Typical proportion for most courses is 15%. 50% of new

questions per examinations may create inconsistencies in examination difficulty. Increasing

viewing time for each question or utilize a computerized method to allow students to control

their own pace

b. Lack of remediation for students performing poorly

Given the current weighting and objectivity of the MCQ examination, students who perform

poorly on this component should receive feedback and advice for improvement regardless of

their overall standing



conclusion that:

Some issues have been indentified, ongoing improvements encouraged .... . . . . . . . . . . . . . . . . . .

Please provide a letter reviewing the impact of the recent & proposed changes in one year‟s time

___________________________


___________________________ ____________________________

Vince Bertucci, Course Director Date

Maha Haroun, Course Director

Distribution: Course Director

ESAC File

Clerkship Coordinator

Vice-Dean, UME

ESAC Review Course: DOCH II Course Director: Ian Johnson Report: April 1, 2008 Lead Reviewers: R. Gupta and J. Wang Background This is a year long course in year 2. The goals of the course are to encourage lifelong learning, and to develop and employ research methods, particularly in community health settings. There are 21 hours of lectures, 19 hours of seminars, and the bulk of the course-time is left for individual learning and project work. Evaluation Components There are 8 components to the evaluation of students.

1) Librarians mark the library search strategy, and this component is weighted at 10%. Librarians receive standard setting education.

2) The individual learning plan is weighted at 20% and is assessed using a structured evaluation form with a separate page for feedback to the student.

3) Twenty percent of the mark is derived from a 50 item MCQ examination held at midterm. This examination assesses didactic material taught on research methodology.

4) Students complete a progress report on their individual learning project that is weighted at 10%. The progress report is evaluated using a structured form with anchors describing the performance required to achieve specific marks and there is space for feedback to the students.

5) Assessment of the student’s oral presentation of their project is worth 20% of their final mark.

6) The final written report on their project is assessed using clearly defined criteria, and is worth 15% of the final mark.

7) Attendance and participation at the community agency is evaluated by agency representatives and is worth 5% of the final mark.

8) Professionalism is evaluated by the agency but not graded. Creation and Monitoring of Evaluation Tools The MCQ is a secure examination and 10% of questions are new each year. Items are reviewed if they perform poorly on statistical review (eg. Too easy or too difficult) or there were questions regarding the item during the examination. The exam is set and proctored by the course director. The evaluation forms for the individual learning plan, progress report, oral presentation, and written report were developed by the course director with input from other educators and students. The forms have been improved considerably since the last ESAC review. Each tool describes the components of the exercise that are evaluated, weight of the components, and provides space for detailed feedback to the student.

Analysis of Evaluation Methods Overall, students perform very well in this course with class averages of 83-85% over the last 3 years. The vast majority of students obtain an honours grade (78-92%) and only 1 student failed the course in the last 3 years. The standard deviation of the final marks is small at about 4% over the last 3 years. The vast majority of students receive an honours grade on the independent learning project, MCQ examination, ILP progress report, project presentation, and final written report. A lower proportion of students receive honours on the library search strategy (43-62%). The independent learning project, independent learning project progress report, and the final written report are marked by the same individual. A limitation of all of the written assessments is that there is only one marker for each assignment.. However, since the vast majority of students receive an honours grade, there is consistency, vis a vis, reproducibility, in the marking scheme. The MCQ examination consistently results in a reliability score of about 0.65, which is modest for an exam with 50 items. The library search strategy mark is generated by a trained librarian but a standard marking and feedback form is not used. With respect to validity, there is a positive correlation between the component marks and the final mark, although this is expected. The correlation between the exam and presentation marks is 0.1. The correlation coefficients for the various written assignments over the past three years range from .37 to .61. These three marks pertain to the same domain and are marked by the same person, therefore a relatively strong correlation is expected. The course director hypothesizes that given the second assignment is a progress report the grades will not correlate highly as improvement is expected. He estimates that it is approximately 10% of students who score quite low on the initial assignment but they consistently improve substantially by the second assignment. The course director has also convincingly demonstrated face and content validity. The fact that there are no differences between academies, adds somewhat to the construct validity of the assessment instruments. Areas of Strength

1. Multiple testing methods to sample various competencies. Matching of assessment method to the task being assessed.

2. Assessment of the application of knowledge (ie. completing a research project vs testing factual knowledge)

3. Provision of an excellent learning experience in professionalism. 4. The course director should be commended for the changes made to the assessment

methods since the last ESAC review. Areas of Improvement

1. Variability in agency support, librarian support, and supervisor support. The committee

recognizes that there is no way to eliminate this variability.

Recommendations

1. Consider a standardized form for assessing the library search strategy and provision of feedback

2. Consider incorporating the complexity of the project and level of agency support within the marking scheme. One option may be to add a “box” on evaluation forms for the library

search strategy, final write-up and presentation, to remind markers to incorporate these variables into their grades.

3. Encourage supervisors to provide feedback in a more timely fashion, for all assignments. Consider deadlines for the supervisors with a mechanism for identifying overdue feedback. Contact details of assigned faculty advisors should be explicitly noted on the course website at the start of the course.

4. To address the issue of the high proportion of students that obtain an honours mark, the MCQ examination items should be reviewed. Consider removing from the pool, any items that are answered correctly by more than 90% of the students.

5. Increase to 10-25%, the proportion of new items on the MCQ examination, from the current 10%.

Conclusion of Review Based on the CRICES report presented and the opinion of ESAC members, it is our conclusion that: Continued improvement encouraged ............. Full review next cycle Respectfully submitted, R. Gupta


Emergency Medicine (EMR400Y)

The course director, Dr. Rick Penciner, presented a comprehensive review of the Emergency Medicine

rotation on April 7, 2009.

Lead Reviewer: Dr. Richard Pittini

Student Reviewer: Ms. Alyse Goldberg

Components of the Student Assessment System:

There are three components that together account for the student‟s final mark. They include a written

examination, a global clinical evaluation and a seminar participation mark

5. Final written examination - 50%

The final written exam consists of 20 MCQs, 8 SA, and 5 key feature questions. Material evaluated

is drawn from the manual provided to students. Average marks are 79%, 82%, and 78% for the last

three years.

6. Global Clinical Evaluation – 44%

Marks are generated from shift encounter cards that are completed by 3-5 supervising faculty. Marks

are consistent over the last three years with an average of 82%. The standard deviations are narrow

at 3.9 – 4.3

7. Seminar Participation – 6%

Students receive a mark of 2% for each seminar attended. There is no evaluation tool utilized.

The overall class average has been stable between 80-82% over the last three years with approximately

2/3 of students receiving honours. No students have failed and very few fall into the borderline (60-

70%) range. The various evaluation components consistently demonstrate low correlation (0.15) but

may reflect different domains being evaluated.

Areas of Strength:

The written examination has excellent face validity with clear mapping between course objectives,

curricular content and evaluations. The thorough review of questions by multiple reviewers promotes

consistently high quality questions. The use of 20% new questions per iteration with frequent updates to the

course manual keeps the questions pertinent.

Students are directly observed by more than one faculty during the course of their rotation. Clinical

Evaluations are structured and faculty receives instructions on how to complete these evaluations.

Evaluations are consistent across sites with no statistically significant differences being noted for the most

recent dataset.


1. Seminar participation is not evaluated in a structured fashion and marks assigned only reflect

attendance at mandatory sessions. The material covered in the sessions is better evaluated with the

current written examination.

Recommendations: reallocate the marks assigned to seminar participation to either the written

examination or to alternate evaluations. Attendance at x number of seminars could be a pre-requisite

for credit or lack of attendance can be addressed with the professionalism evaluation

2. Encounter cards are inconsistently used or under-utilized by faculty as rich sources of formative

feedback to students.

Recommendations: modify the encounter cards to make the sign-off by faculty m ore specific as to

whether the encounter was reviewed with the student in person. Faculty should be instructed as to how

many such „reviewed‟ encounter cards they are expected to complete (set a minimum to ensure

uniformity across sites). Faculty cannot be expected to provide comments on all criteria listed on the

encounter card on every occasion; a “N/A” column should be added to the form. As well, a

standardized method of balancing the weight of the evaluation from staff where there was only a single

encounter with those that can evaluate the progression of a student over many shifts should be

encouraged.

3. A significant goal of the rotation is to teach technical skills, both during shifts as well as during time

dedicated to learning technical skills. Direct observation occurs by both physicians and nurses and

students are given the opportunity to perform basic technical skills on patients as well as during the

dedicated „technical skills‟ half day.

Recommendations: Several options are available for evaluating technical skills during this rotation.

Modified Technical Encounter cards could be completed by physicians or nurses observing students

performing basic skills. These cards could include detailed checklists for specified procedures and a

global rating for overall performance A method to standardize exposure would be to use models to

evaluate technical skills, this could be incorporated into the „technical skills‟ session. Marks currently

assigned to Seminar Participation can be re-allocated to Technical skills.



conclusion that:



cc: Vice-Dean, UME

______11/23/2009_____________

Date


Family Medicine Clerkship – Review

Clerkship Director: R. Freeman

Report: June 10, 2003

Lead Reviewer: R. Pittini

Student Reviewer: M. Warsi

Background

The Family Medicine CRICES report was presented to the ESAC committee by Dr. Freeman on June 10,

2003. This clerkship rotation is a four week rotation in the third year of the undergraduate medical

curriculum. Students receive an orientation at the beginning of their rotation and are provided with an

elaborate list of learning objectives. The same course committee members are responsible for both the

development of objectives and examinations. Students are assigned to faculty and benefit from a low ratio

of faculty to students. Students participate in patient care under the supervision of faculty.

Students are evaluated using an OSCE examination, clinical/ward evaluation, and a written assignment.

Student are required to complete a self-assessment and to meet with their supervising faculty to receive

feedback. A mark is awarded for completing this course requirement. A log of patient encounters is

required but is not assigned a grade. Professionalism is evaluated throughout the course using the

undergraduate medical education professionalism form. Professionalism is a requirement of the course but

does not contribute to the final course grade. Students are required to pass each individual component of the

course in order to pass the course.

The weighting of the various components is distributed as:

OSCE – 42.5%

Clinical evaluation – 45%

Written assignment – 12.5%

Feedback – 5%

Students perform well overall with recent class averages of 78%, 77%, 77%, and 79%. The proportion of

students receiving honours is approximately 30% for the most recent three years.

Student input suggests that the course is perceived as fair but that the OSCE component is considered

difficult. The average for the OSCE is standardized at 74% in order to control for differences in difficulty

among various exam questions. The OSCE average is therefore slightly lower than the overall class average

and is in keeping with student perceptions.

The OSCE consists of a five station examination which includes post-encounter probes at each station.

Stations are selected from a large pool of secure scenarios. Stations are developed by case writers with the

assistance of a guide book. The stations are field tested and standardized patient portrayals are videotaped

to ensure consistency. All previously used stations are reviewed and modified as required prior to re-use.

Approximately 40% of stations are new for each year.

The average score for the OSCE component is standardized to 74% in order to adjust for the variability in

exam question difficulty. This typically results in a 4-7% upward adjustment. Attempts to use other

methods for adjusting marks were less effective (e.g. mean borderline standard setting). The current

25

proportion of students failing is 1.6-2.2%, with 14-19% receiving honours. This method of adjustment

appears to be able to discriminate between students in academic difficulty and those above average.

Students who score less than 65% or receive a global rating of borderline or not competent on any two of

the five stations are deemed to be in academic difficulty. Four of the five stations must be passed to pass the

OSCE.

The psychometric properties of the OSCE are good with acceptable internal consistencies of between .66

and .72. Concurrent validity has been assessed as a research study which indicated that there was

reasonable correlation between different courses particularly when using global ratings. The current OSCE

marking scheme is weighted toward the global score. A second study has shown that the residents in family

medicine perform better than clinical clerks on the OSCE suggesting good construct validity.

The clinical evaluation of students is based on the completion of ward assessment forms. Ward assessments

are completed with contributions from faculty, residents and nurses. A template is then employed to convert

completed forms into numeric grades. The adoption of a template following academic year 1998-9 resulted

in a decrease in the proportion of students receiving honours from 70% to approximately 50% which has

been sustained. There has only been one student who failed this component in the four years of data

presented. This is disproportionate to the objective scores on the OSCE and may represent the reluctance of

faculty to use the left end of the scale. Data regarding the inter-rater reliability of these evaluations is not

presented but would be interesting, especially with regards to the effect of adopting a template for

determining grades. The ward assessment appears to have good content validity based on the mapping of

objectives to evaluations.

The academic project consists of a written component and an oral presentation. The written component

consists of an abstract and is worth approximately one third of the grade. The abstract is evaluated using a

guide with explicit descriptors. The presentation is also evaluated using a marking guide with elaborate

descriptors for each criteria. The class averages for this component tends to be higher (>80%) than others

components with a larger proportion of students obtaining honours (59-73%) and no students failing. An

significant decrease in marks was observed following a faculty development session in 1998-9 with a

subsequent upward drift. This effect may represent an under-utilization of descriptors over time and may

require ongoing faculty development.

Student feedback is provided in informal verbal and formal written formats. Format feedback using the

Clinical Encounter Feedback Exercise forms requires students to self-assess. Both of these types of

feedback are formative and are based on close observation, taking advantage of the low student to faculty

ratio. Close supervision and feedback were cited as strengths of the rotation by students. Students are

provided with a formal mid-rotation feedback session with their hospital program director. Summative

feedback is provided to students including narrative evaluations for both the OSCE and the Academic

Project. Students are provided with a ranking for individual OSCE stations. This information is presented

only for their reference. They also receive written comments from the examiners on each station.

Dr. Risa Freeman and her course committee are to be commended on an exemplary course. The specific

areas of strength and areas for improvement with recommendations are described as follows:

Areas of Strength

1. OSCE quality

26

Your utilization of the OSCE examination for the assessment of clinical skills is to be considered

exemplary. The use of an examination committee, the systematic process of station development and

revision and size of the examination pool is ideal. The psychometric properties of the examination are good

and the relative weighting in the course appears to be appropriate. The use of the adjustment factor to

account for variation in question difficulty is functioning well and should be retained. Student anxiety may

be reduced if they are made aware of this adjustment factor.

2. Feedback

This course provide a variety of formative and summative feedback. The close supervision of students by

faculty and the low ratio of faculty to students facilitate accurate, timely and apparently well received

feedback. The use of forms to guide feedback ensures quality feedback is provided. The provision of

written summative feedback is beneficial to students and likely confers an education benefit on the OSCE

examination. The use of normative referenced outcomes (e.g. OSCE rankings by 3rd

„s) is not permitted in

the grading policy but as a means of feedback provides students with supplemental information regarding

their individual performance and should be continued so long as welcomed by students.

3. Mapping of objectives to evaluations

The use of a common committee members for the development of objectives and examinations helps to

ensure linkage between course specific objectives and course examinations. The objectives for this rotation

are clearly linked to criteria for evaluation within each of the examination components. This course should

be well prepared to provide a mapping of these linkages to the program effectiveness committee.

Areas for Improvement

1. OSCE Sub-component analysis

The weighting of individual components within the OSCE needs to analyzed further to determine whether it

is optimal. While emphasis on the global score may optimize concurrent validity, increasing checklist score

weighting may reduce the need for a consistent upward adjustment

2. Evaluation of Feedback

The assignment of 5% contingent upon completing the required feedback session is not a valid form of

evaluation. In order to assign a weighting of 5% to the feedback aspect of this course it would be necessary

to evaluate the quality of feedback. As the feedback is directed toward the student and coming from the

faculty, the evaluation of its quality is somewhat problematic. There is an opportunity to evaluate the

students ability to self-assess. The 5% should not be used solely as a means of motivating students and

faculty to comply with a specific course requirement.

3. Grade Inflation for Academic Project

Despite utilization of a marking guide with specific descriptors for each criteria there appears to be a trend

toward mark inflation. It is important to ensure that the quality of the presentations and abstracts are being

evaluated in addition to the amount of effort put into them. It may be necessary to continue to provide

faculty with instruction on how they should make use of the descriptors. While you report that the primary

intent of including the Academic Project in the evaluation scheme is not to discriminate between students

this remains an important aspect. Given that the OSCE and Clinical Evaluation are assessing similar

27

domains of clinical competence it is very important that the evaluation of the academic project be retained

and in fact should be emphasized as it evaluates different and important aspects of student competence.

Recommendations

1. Analyze individual sub-components of the OSCE e.g. Checklist, Global, PEP to determine how closely

they agree with each other and what the effect would be of increasing the weighting of the checklist

(well suited for evaluation of novices).

2. Evaluate individual Ward assessment forms prior from individual faculty, residents, and nurses prior to

consensus in order to assess the inter-rater reliability of the form in this course

3. Compare inter-rater reliability prior to using a template vs. after template if data is available in order to

determine the effect of using a template to derive grades

4. Eliminate the 5% mark given for completing the required feedback forms.

5. Incorporate a faculty development session on completing the Academic Project evaluation each year of

the course.

6. Consider increasing the weighting of the Academic Project evaluation to emphasis the competencies of

the students outside of clinical skills.

7. Continue to evaluate the psychometric properties of the Academic Project evaluation and strive to

improve evaluation so as to reduce „noise‟ in the final grade while maintaining a balanced course

evaluation.

8. Continue to collect student feedback and act upon it in the exemplary fashion you have to date.

Specifically ensure that students wish to know their relative ranking on the OSCE stations.

Specific Requests

With regards to your request for support of statistical analysis:

1. The recruitment of a full-time Evaluation coordinator may facilitate further analysis within courses

but may also allow the coordination of cross-course analysis

2. Consider following University of Toronto graduates who enter the Family Medicine residency at the

University of Toronto as an opportunity to analyze predictive validity

With regards to your request for ongoing support from ESAC for the maintenance of OSCE examinations:

Your course models the ideal of objective clinical assessment and in our opinion warrants ongoing

support not just restricted to this rotation but for all rotations


Based on the CRICES report presented, the accompanying appendices and the opinion of the ESAC

committee members, it is our conclusions that:

Continued improvement encouraged .................... Full review next cycle (approx. 3 years)


Risa Freeman, Clerkship Director

28

Examination and Student Assessment Committee Medicine Phase I (MED300Y) Clerkship Review

At the ESAC Meeting on November 6 2007, Dr. Danny Panisko, the Department of Medicine

Undergraduate Education Director, and Dr. Rajesh Gupta presented a comprehensive CRISES report on the

Medicine Phase I Clerkship system of student assessment.

A. Components of the Student Assessment System:

The three components of the assessment system are outlined below:

1. Multiple Choice Written Examination (30% weighting)

The MCQ exam consists of 75 questions and is 2.5 hours in length. The exam blueprint is based on

the course objectives, with 6-9 questions representing each of nine content/specialty domains. A

small number of questions is included from seven other areas (e.g., ECG, chest x-rays). On each

exam, 10-20% of questions are new.

2. Oral/Clinical Skills Examination (20%)

The format and marking of this exam are standardized. The clinical case will vary from student to

student. Aspects of the clinical skills exam process include:

a. The patient/clinical case is selected by the site-coordinator/delegate.

b. The student conducts a history and physical exam without observation by an examiner (90

minutes).

c. The student is assessed on presentation, diagnosis and investigative plan in a structured oral

exam format by one examiner. The evaluation form includes two components: a checklist

(44 items) and global ratings (9 items).

d. The student is also asked to perform three maneuvers randomly chosen a priori by the site

coordinator from a bank of ten maneuvers. Each maneuver has a separate evaluation form

including a checklist and global rating scale.

e. The evaluation session with the examiner can take 30 to 45 minutes.

3. Ward Evaluation (50%)

The ward evaluation form is a standard design for all faculty clerkship rotations with a common five-

point rating scale (Unsatisfactory to Outstanding), specified weights assigned to each level of

performance, and 18 performance criteria listed according to the CanMEDS Roles.

For the Phase I Medicine evaluation form, the rating on each of the 18 performance criteria is

weighted in the calculation of the ward evaluation mark. The specific weight of each criterion is

assigned by the course.

The completion of the ward evaluation was described as being a consensus process involving the site

coordinator with the residents and staff who had supervised the student.

B. Strengths of the Assessment System:

For Medicine Phase I, the strengths of the student assessment system as described in the CRISES Report

include:

1. The variety of evaluation methods that allows sampling of many CanMEDS skills;

29

2. The commitment to feedback in which each evaluation method includes a feedback system; and

3. The approach to borderline students, provision of extra work/remediation and follow-up.

C. Observations from the Assessment Data:

Course level grade statistics reported for academic years 2004-05 to 2006-07 were highly consistent in

general. At the level of teaching site and rotation block, detailed information presented for 2006-07

indicated some variations by site or rotation block.

1. Final Course Grades:

1.1 At the course level, mean Final Grades were consistent over time with a range of 0.6% (80.4 - 81.0).

The decrease in standard deviation (4.2 to 3.5) indicated grades were clustering more around the

mean. For 2006-07, the percentage of honours grades decreased by eleven percent (65 to 54).

1.2 Final Grades by Teaching Site and Rotation:

The range was 3.2% in mean final grades for the five teaching sites (78.4 - 81.6) and six rotation

blocks (78.7 - 81.9) for 2006-07.

2. Multiple Choice Question Exam:

2.1 At the course level, mean MCQ Exam marks were consistent over time with a range of 1.9% (76.8 -

78.7). For 2006-07, the percentage of honours grades decreased by eleven percent (49 to 38).

2.2 Reliability:

The mean reliability alphas for six rotation exams per year were: .73, .64 and .60. As stated in the

report, the MCQ exam tests a variety of disciplines and subspecialties and more uniform consistency

of responses would not be expected across the varied content domains.

2.3 Construct Validity:

For 2006-07, there was a range of 8.3% in the mean MCQ mark by rotation (72.2 - 80.5 for

Rotations 1 and 2, respectively); as a raw score, this represented a difference between 54 and 60 out

of 75. The report suggested that the lowest mean mark might have been "a first rotation

phenomenon"; however, the mean mark for Rotation 6 was the second lowest, at 74.0.

3. Oral/Clinical Skills Exam:

3.1 At the course level, mean Oral/Clinical Skills Exam marks were consistent over time with a range of

0.8% (79.3 to 80.1). The percentage of honours was stable (54 - 55).

3.2 Reliability:

For the six rotation exams in 2006-07, internal consistency alphas for the Oral Exam were high for

both components of the structured oral: Checklist items (.75 - .91) and Global ratings (.87 - .93).

3.3 Oral/Clinical Skills Exam Marks by Teaching Site:

The range was 5.3% (76.9 - 82.2) in mean oral/clinical skills exam mark by teaching site in 2006-07.

4. Ward Evaluation:

30

4.1 At the course level, mean Ward Marks were consistent over time with a range of 0.5% (82.6 - 83.1).

The percentage of honours was stable (79 - 81). The decrease in standard deviation (3.7 to 3.2)

indicated grades were clustering more around the mean.

4.2 Ward Marks by Teaching Site:

The range was 3.9% (80.9 - 84.8) in mean ward marks by teaching site in 2006-07.

4.3 Individual Ward Performance Criterion Ratings:

From the 2006-07 UMEO summary bar graphs of mean ratings on the 18 criteria for the course

overall and by academy, observations include:

a. At the course level, four criteria have a relatively high mean rating (~ 4.5 out of 5.0) which

indicates that a large percentage of students received an "Outstanding" rating.

b. At the academy level, an additional five criteria have a relatively high mean rating (~ 4.5 out

of 5.0) at one academy in comparison to the two other academies.

D. Discussion:

The Phase I Medicine CRISES Report is a comprehensive overview of the system of student assessment.

The Appendix to the report included further statistics and breakdowns to inform the ESAC review.

Following the presentation, ESAC requested several points of clarification on the Oral Exam and Ward

Evaluation which were provided at a subsequent ESAC meeting.

Course level grade statistics for academic years 2004-05 to 2006-07 indicated results were consistent over

time with respect to mean final grades and grade components. Detailed results presented for 2006-07

indicated some variability by teaching site and rotation block.

Although the variation between sites was addressed, the test used in the analysis (Kruskal-Wallis) is based

on rank ordering and may not address the key issues. Based on observation of the data, the differences

between sites may be educationally significant regardless of whether they reach statistical significance and

therefore it is worthwhile continuing to collect meaningful data that will allow for between site

comparisons. In the future student marks on the ward evaluation should be compared using an ANOVA.

Concluding comments on each assessment component are presented below.

1. Multiple Choice Question Exam:

1.1 Construct Validity:

With respect to the variation in mean scores across rotation exams in 2006-07, the introduction of

the new MCQ questions developed in 2006 may have been a factor for two reasons: (a) the difficulty

level of the new items may have varied across exams, and (b) exams with a larger proportion of new

items (ranging from 10 to 20% on each exam, or 7 to 15 items) may have generated inconsistent

mean scores. To further investigate the variation in mean scores by rotation, an analysis by

content/specialty domain should indicate whether differences across rotation exams were specific to

a content area or generalized across content areas.

1.2 Exam Blueprint:

For the small number of questions included from seven areas outside the nine content/specialty

domains, there should be some consideration as to whether the MCQ format is adequately assessing

31

the knowledge base in these areas (e.g., ECG, chest x-rays). An option might be to assess these

areas through a section on diagnostic test interpretation on the Oral/Clinical Skills Exam.

2. Oral/Clinical Skills Examination:

2.1 Sub-components of the Oral/Clinical Skills Exam:

The following additional information was provided on request:

a. For the calculation of the mark, three sub-component scores are weighted as follows:

Structured Oral Exam Checklist score (.50) and Global ratings (.40), and score on three exam

maneuvers (.10).

b. With respect to the correlations between sub-components, correlations were considered to be

good and had been the topic of an earlier research study.

2.2 Oral/Clinical Exam Marks by Teaching Site:

For 2006-07, the mean exam marks by site ranged by 5.3%. To investigate site differences, an

analysis by each sub-component (checklist, global ratings, exam manoeuvers) could identify whether site

differences are component-specific.

Another factor for investigation might relate to characteristics of the

patients/clinical cases for the exam as a potential variation by teaching site.

3. Ward Evaluation:

3.1 The completion of the ward evaluation was described as a consensus process. Further information

indicated that while the ward evaluation process is standardized across sites in many aspects (e.g.,

formal feedback, input from staff and resident teachers), the consensus process may be variable by

site due to adaptation to site-specific conditions.

3.2 Ward Marks by Site:

For 2006-07, the mean ward marks by site ranged by 3.9%. The site-specific consensus process

could be a factor in site differences in overall ward marks and rating distributions of the individual

performance criteria contributing to the overall ward mark calculation.

To investigate differences in ward marks by site, a review process might be established to examine

the marks and performance ratings by site and determine whether further action would be required,

e.g., to increase consistency across sites, review the process at a system level and/or continue to

monitor the ward assessment ratings and statistics.

E. Suggestions and Recommendations:

1. Multiple Choice Question Examination:

1.1 Continue to monitor mark statistics across rotation exams, in particular when new test items are

being introduced.

1.2 Maintain the same proportion of new test items on each rotation exam.

1.3 In reviewing item statistics, consider screening items for different levels of difficulty to assess

whether each exam represents a similar balance of easy, moderate, difficult items.

32

2. Oral/Clinical Skills Examination:

2.1 Consider including a section on diagnostic test interpretation to incorporate areas that might be better

evaluated in this setting than the Multiple Choice Question format.

2.2 Review mark statistics at the level of sub-components (checklist, global and manoeuvers) to assess

whether site differences are component-specific.

3. Ward Evaluation:

3.1 Review the Medicine Phase I Ward Evaluation Form with department clerkship representatives and

site coordinators for all teaching sites.

3.2 Review each performance criterion and respective rating statistics to identify the level of information

being provided and the differences in rating patterns by teaching site.

3.3 Develop consensus on each performance criterion with respect to the purpose and function of each

criterion for the ward evaluation form.

3.4 Develop a standardized approach for completing the ward rating scale, communicate the approach to

teaching sites, and continue to monitor the implementation.

Recommendations for the ward evaluation to address grade inflation:

3.5 Promote reasonable standards for the ward rating scale.

3.6 Discourage or prevent the use of the highest rating of "Outstanding" as the default category.

3.7 Review the individual criterion weights for the calculation of the ward grade and the impact of each

criterion rating on the grade calculation to determine whether the weighting is optimal or requires

adjustment.



conclusion that:


___________________________


cc: Vice-Dean, UME

______10/02/12_____________

Date

33

Examination and Student Assessment Committee Medicine Phase II (MED400Y) Clerkship Review

At the ESAC Meeting on December 4 2007, Dr. Danny Panisko, the Department of Medicine

Undergraduate Education Director, and Dr. Rajesh Gupta presented a comprehensive CRISES report on the

Medicine Phase II Clerkship student assessment system.

A. Components of the Student Assessment System:

The components of the assessment system are outlined below:

1. Short Answer Written Examination (30% weighting)

The Short Answer Exam consists of 16-17 questions and is 2.0 hours in length. For each exam, 2

questions represent each major subspecialty in medicine, including ethics and clinical pharmacology.

The exam blueprint is based on the course objectives. The exam content is taken directly from the

"orange booklet" which lists common and life-threatening problems and diseases that a Phase II

student is expected to know.

2. Objective Structured Clinical Examination (20%)

The OSCE consists of 8 stations and is 90 minutes in length. There is one examiner per station. Each

exam incorporates nine specialty areas and reflects the major content areas expected of a Phase II

Medicine student. Six stations are patient-based and two are written stations selected from three

areas: chest x-ray, ECG or clinical case.

3. Ward Evaluation (CTU) (25%)

The ward evaluation is based on the student's 2 to 3 week rotation on a Clinical Teaching Unit. This

evaluation is completed through a consensus process by the site coordinator with the residents and

staff who had supervised the student. The ward evaluation form is a standard design for all faculty

clerkship rotations with a common five-point rating scale (Unsatisfactory to Outstanding), specified

weights assigned to each level of performance, and 18 performance criteria listed according to the

CanMEDS Roles.

For the Phase II Medicine ward evaluation form, the rating on each of the 18 performance criteria is

weighted in the calculation of the ward evaluation mark. The specific weight of each criterion is

assigned by the course.

The completion of the ward evaluation was described as being a consensus process involving the site

coordinator with the residents and staff who had supervised the student.

4. Ambulatory Clinic Evaluations (15%)

This performance evaluation is based on the student's 2 to 3 week rotation on ambulatory clinics.

The evaluation form includes skills relevant to the ambulatory care of patients.

5. Written Assignments (10%)

There are two assignments each weighted 5%: a reflection exercise and a topic review.

B. Strengths of the Assessment System:

34

For Medicine Phase II, the strengths of the student assessment system as described in the CRISES Report

include:

1. The variety of evaluation methods that allows sampling of many CanMEDS skills;

2. The commitment to feedback in which each evaluation method includes a feedback system; and

3. The approach to borderline students, provision of extra work/remediation and follow-up.

C. Observations from the Assessment Data:

Course level grade statistics reported for academic years 2004-05 to 2006-07 were consistent in general. At

the level of teaching site and rotation block, detailed information presented for 2006-07 indicated some

variations by site or rotation block.

1. Final Course Grades:

1.1 At the course level, the mean Final Grades were consistent over time with a range of 0.7% (80.6

- 81.3). The percentage of honours grades increased by fifteen percent in 2005-06 and remained at this

level for 2006-07 (i.e., 52, 67 and 68).

1.2 Final Grades by Teaching Site and Rotation:

For 2006-07, the range was 2.7% in mean final grades for the five teaching sites (79.7 - 82.4). For

the five rotation block means, the range was 3.1% (80.1 to 83.2).

2. Written Short Answer Written Examination:

2.1 At the course level, the mean Short Answer Exam marks were consistent over time with a range of

1.5% (77.4 - 78.9).

2.2 Short Answer Exam Marks by Teaching Site and Rotation:

For 2006-07, there was a range of 6.5% in the mean exam mark by rotation (75.6 - 82.1 for

Rotations 2 and 1, respectively).

3. Objective Structured Clinical Examination:

3.1 At the course level, OSCE mark statistics were consistent over time with respect to the mean mark

(76.4 - 77.3), percentage honours (26 - 31) and standard deviation (5.2 - 6.0).

3.2 OSCE Marks by Teaching Site and Rotation:

For 2006-07, the range was 2.6% in the mean OSCE mark by teaching site (75.0 - 77.6).

By rotation, the range was 3.6% (74.6 to 78.2).

3. OSCE (continued):

3.3 Reliability:

Evidence of reliability of the OSCE Stations was indicated by the following analyses:

a. Correlations between Station Checklist and Global Scores were .37 to .70 (i.e., for 7 stations

in 2004-05, 8 stations in 2005-06)

b. For Global Ratings on all exam stations by academic year, the mean Cronbach's alpha was

.75 (2005-06) and .76 (2004-05, 2006-07).

35

c. For Global Ratings on each station in 2006-07, Cronbach's alpha reliabilities ranged from .55

to .83, a measure of the internal consistency of the ratings comprising the Global Score for

each station.

4. Ward Evaluation (CTU):

4.1 At the course level, the mean Ward Evaluation mark increased by about three percent from 2004-05

to 2005-06 and remained at this level for 2006-07 (i.e., 80.3, 83.0, 83.3). In 2005-06, the percentage

of honours grades increased by 20 percent (i.e., 61, 81, 81).

4.2 Variations by Teaching Site and Rotation:

For 2006-07, the range was 2.9% in mean ward marks by teaching site (81.6 - 84.5).

By rotation, the range was 4.1% (82.0 to 86.1).

5. Ambulatory Clinic Evaluations:

5.1 At the course level, Ambulatory Clinic Evaluation mark statistics were consistent over time with

respect to the mean (80 - 81), standard deviation (1.7 - 2.2) and overall mark distribution (9 - 11).

In 2005-06 the percentage of honours grades decreased by 17 percent and remained at this level for

2006-07 (i.e., 73, 56, 56).

5.2 Variations by Teaching Site and Rotation:

For 2006-07, the range was 2.0% in mean ambulatory mark by teaching site (78.8 - 80.8).

By rotation, the range was 1.7% (79.3 - 81.0).

6. Written Assignments:

6.1 Weighted a total of 10%, the written assignments do not represent a major component of the

assessment system. Statistics on the assignments were not required for the CRISES report.

By extrapolation from the grade data for the major components, the mean Assignment mark was

estimated to be about 96%.

The Phase II Medicine CRISES Report is a comprehensive overview of the system of student assessment.

The Appendix to the report included further statistics and breakdowns to inform the ESAC review.

Course level grade statistics for academic years 2004-05 to 2006-07 indicated results were consistent over

time with respect to mean final grades and grade components with the exception of the ward evaluation.

Detailed results presented for 2006-07 indicated some variability by teaching site and rotation block.

Validation of evaluation methods requires ongoing collection of meaningful data to facilitate between site

comparisons. In the future, analysis of student marks on the ward evaluation should be compared between

sites using an ANOVA.

Concluding comments on three assessment components are presented below.

1. Ward Evaluation:

1.1 In 2005-06, the Ward Evaluation mean mark increased by 3% and the proportion of honours by

20%. This trend continued in 2006-07. The calculation of the mark is based on the ratings on the

36

eighteen individual performance criteria and, thus, the distribution of the performance ratings

appears to have changed in some respects.

1.1 The completion of the ward evaluation was described as a consensus process. Further

information indicated that while the ward evaluation process is standardized across sites in many

aspects (e.g., formal feedback, input from staff and resident teachers), the consensus process may be

variable by site due to adaptation to site-specific conditions.

1.3 Ward Marks by Site and Rotation:

For 2006-07, the ward mark data presented by teaching site and rotation showed a difference of 3%

in means by site and 4% in means by rotation block. The site-specific consensus process could be a

factor in site differences in overall ward marks and rating distributions of the individual performance

criteria contributing to the overall ward mark calculation.

To investigate differences in ward marks by site, a review process might be established to examine

the marks and performance ratings by site and determine whether further action would be required,

e.g., to increase consistency across sites, review the process at a system level and/or continue to

monitor the ward assessment ratings and statistics.


2.1 The Ambulatory Clinic Evaluation mark statistics were very consistent over time, indicating a stable

process. It should be noted that the marks for this assessment component have a narrow mark

distribution and little differentiation between students. It may be useful to review this component to

determine whether the assessment from the clinic setting is providing the intended information.


3.1 For the two written assignments, the mean mark was estimated at about 95%; thus, it appears

that this mark is a „bonus‟ 10% for most students. If this mark were removed, the mean grades would be

lower; however, the value of these assignments for student learning may outweigh the problem of

contributing to grade inflation. Student and teacher feedback on the assignments may be useful in

reviewing the value of this component of the student assessment system.

Suggestions and Recommendations:

1. Ward Evaluation (CTU):

1.1 Review the Medicine Phase II Ward Evaluation Form with department clerkship representatives/site

coordinators for the different teaching sites.

1.2 Review each performance criterion and respective rating statistics to identify the level of information

being provided and the differences in rating patterns by teaching site.

1.3 Develop consensus on each performance criterion with respect to the purpose and function of each

criterion for the ward evaluation form.

1.4 Develop a standardized approach for completing the ward rating scale, communicate this approach

to teaching sites, and continue to monitor the implementation.

37

Recommendations for the ward evaluation to address grade inflation:

1.5 Promote reasonable standards for the ward rating scale.

1.6 Discourage or prevent the use of the highest rating of "Outstanding" as the default category.

1.7 Review the individual criterion weights for the calculation of the ward grade and the impact of each

criterion rating on the grade calculation to determine whether the weighting is optimal or requires

adjustment.


2.1 Consider a review of the Ambulatory Clinic Evaluation component to ensure there is consistency

between the objectives of the evaluation, the process and the criteria.


3.1 Review the value/strengths of the written assignment component as implemented now.

3.2 If it is confirmed that most students receive full marks (10%), consider determining the effect on the

overall grade statistics and reassess whether changes are required e.g., to adjust the marking system,

replace the component, or reassign the 10% to the other components of the assessment system.



conclusion that:


___________________________


cc: Vice-Dean, UME

______10/02/12_____________

Date

38



Examination and Student Assessment Committee (ESAC) Course Review: OBS/GYN (OBS 300)

Course Director: Dr. F Meffe

Reviewers: Dr. P. J. Morgan and Ms. K. Hershenfield

The course director, Dr. F. Meffe, presented the review of the OBS 300 course on Tuesday, May 3, 2005.

OBS 300 is a 6-week course in phase I of clerkship. Students spend six weeks as a member of a clinical

team taking part in the care and study of women who present to one of the teaching hospitals. Students are

expected build upon their obstetrics and gynecology knowledge from „Foundations of Medical Practice‟ to

understand, appreciate, and apply the knowledge, skills and attitudes required for residency.

There are several methods of evaluation used in this course. The evaluations consist of a written

examination worth 33.3% of the mark, an oral examination worth 33.3% of the mark, and a ward evaluation

worth 33.3% of the student‟s mark.

The written examination consists of a multiple choice question component, worth 25% of the examination,

and a short answer component, worth 75% of the examination. The exam consists of 20 multiple-choice

questions and 30 short answer questions. The multiple-choice component is computer-scored and the short

answer component is marked by only one rater at each site. The written examination class average in the

past academic year was 81.81% with 65.8% of students receiving an honours grade. On the multiple-choice

portion, the class average in the past academic year was 81.30% (73.2% of students receiving an honours

grade). For the short answer component, the class average in the past academic year was 81.96% (65.3% of

students receiving an honours grade).

The oral examination consists of 4 history/physical stations. Students rotate through the stations in a 60-

minute period. There is one rater for each oral examination question per site. The rater uses provided

guidelines to mark candidates for a particular scenario. The class average in the past academic year was

82.63% with 71.6% of students receiving an honours grade.

The ward rating is based on participant‟s daily clerkship encounter forms. Students are expected to submit

approximately 10 daily encounter forms throughout the rotation. Only one rater for any one clinical

encounter completes the daily clerkship encounter form. Raters can include residents and/or staff/faculty.

The daily clerkship encounter form was modified for the 2004-2005 academic year, such that now raters do

not record an actual percentage mark, but rate a competency on a scale from unsatisfactory to outstanding.

The information provided on the encounter forms is used by the clerkship coordinator or the student‟s

mentor to complete the final clerkship evaluation. The final evaluation is a consensus of individual clerkship

encounter forms. Therefore, the ward rating includes input from at least 7-10 raters. The class average on

39

the ward evaluation in the past academic year was 82.40% with 80.0% of students receiving an honours

grade.

The overall class average for the past 3 years was 2003-04, 82.16%; 2002-03, 82.9%; 2001-02, 83.36%. The

proportion of students receiving an honours grade was lower than 80% only in the 2003-04 academic year,

at 73.2%.

Areas of Strength

This course has multiple methods of evaluation which are weighted towards the final mark.

1. Written Examination

The written examination has good content validity as both MCQ and short answer questions are

chosen to reflect seminar objectives and syllabi and are representative of the wide spectrum of course

content. The 20-item MCQ portion has high internal consistency as since 03-04 the exam has been centrally

computer scored. There is also a concerted effort to reduce the percentage of overlap from one examination

to the next.

2. Oral Examination

It was felt that the oral examination was a good assessment tool since the questions focused on

management issues which the students felt to be valuable.

3. Daily Clerkship Encounter Form

The daily encounter forms allow multiple staff/residents to contribute to the final evaluation. The

form was revised in 2004-2005 to better reflect the final clerkship evaluation form and the competencies

expected by students.


1. Written examination

a. Dr. Meffe identified the fact that there was a problem with adequate questions in the

database. Some suggestions for solving this issue included holding a workshop for question

generation or contacting the Society of Obstetricians & Gynecologists of Canada (SOGC) for

the possible development of a large scale database for questions. Another option would be to

eliminate MCQs altogether and focus only on short answer questions.

The course director might consider limiting the number of raters marking the short answer questions to

improve reliability of marking system. Alternatively, divide the examination by question rather than site

for the purpose of marking. Having all students take the written examination at one site might facilitate

this approach. Predictive validity of the MCQ exam could be determined by comparing the results with

the MCCQE data.

40

2. Oral Examination

a. There are multiple raters at each site and there is no evidence about the reliability of the

marking system. The idea of developing an OSCE for this course was discussed but the cost

was felt to be prohibitive.

A formal assessment of the inter-rater reliability of the oral examination markers might be useful.

Consideration of including both a checklist and global rating score for the oral exam was suggested.

Central marking of this format of examination would be feasible.

3. Daily Encounter Forms

a. While it was generally felt that these were useful evaluation tools, a few concerns arose. It

appears that there can be a wide range of what skills are actually assessed on these daily

encounter forms and that the students generally have free rein to select the encounters that

they wish to have assessed. There also may be a variable number of encounter forms

submitted i.e.) not all students will necessarily submit 10 forms. It is also possible that no

technical skills would actually be assessed. It was also mentioned that some of the encounter

forms were ``weighted``. It was unclear as to how or why this would occur.

A new clinical encounter form with guidelines as to the minimum number of expected clinical skills as

well as the development of a marking template would enhance this evaluation tool. The development of

a marking template may facilitate easier transfer of the evaluations onto the final clerkship evaluation

form. This should be kept in mind during the development of this template. Final grades obtained from

the template could be adjusted using narrative comments.

4. Feedback

a. The students feel that they are receiving feedback to a large extent from residents, especially

at the midway point.

Achieve the goal of a formal midpoint feedback by faculty by ensuring that all site coordinators

comply with this requirement. Highlighting for students that they are receiving feedback will

ensure that they are aware of this having occurred.




___________________________ Richard Pittini, ESAC Chair

___________________________

Date

41



Examination and Student Assessment Committee (ESAC) Ophthalmology Review: Catherine Birt MD, Course Director

Review by: Dr. P. J. Morgan

Ophthalmology is a 1-week, course that shares a 6-week block with ENT (1 week) and Family Medicine (4

weeks) in Phase I of the clerkship.

The course director, Dr. C. Birt presented the review of the Ophthalmology course on Tuesday, November

4, 2003. Dr. Birt presented an overview of the course. The didactic component of the course is composed of

8 lectures which are given at each hospital, Mount Sinai, Toronto Western, St. Michael‟s, Sunnybrook and

as well as a ½ day at the Hospital for Sick Children. The audiovisual component of the lectures are available

on the Ophthalmology website. The lectures cover different topics which are used as material for the

examination. On the 2nd

Friday of the 2-week ENT/Ophthalmology rotation, students take a practical

examination where pairs demonstrate certain skills, a list of which are found in the course manual. On

Friday afternoon, all students take a written examination held at Medical Sciences Building. The written

examination comprises 65% of the final mark, the demonstration of clinical skills is worth 25% of the final

mark and the ward assessment comprises 10% of the final mark.

The written examination is composed of 8 short answer questions worth 5 marks each for a total of 40

marks. Of the 8 short answer questions, 3 are based on slides which are projected for about 60 secs and

cannot be retrieved for repeat viewing. There are 10 multiple choice questions each worth 1 mark. The

composition of the written examination is therefore weighted 80% for short answer (40 marks) and 20%

multiple choice questions (10 marks).

Six undergraduate ophthalmology committee members create the examination database and update

questions. One committee member creates each examination and all are reviewed by the course director who

assesses the face validity of the questions. There is a standardized layout of the examination and each

examination has 10-20% new questions. The examination is marked by one member of the committee. An

overall mark of 60% is required to pass the examination and a combined overall mark of 60% is required to

pass the course.

OSCE

The OSCE is comprised of 5, 5 minute stations which may differ from site to site with each hospital

examiner determining which skills will be evaluated. Students are given a list of what skills/maneuvers they

42

may be asked to perform. The OSCE score is based on faculty opinion of how the skill was performed and

does not have a formal scoring template. Two or 3 faculty per site oversee the OSCEs which contribute 25%

of the student‟s final mark.

Ward assessments form 10% of the final mark. Students are evaluated by 1-3 housestaff and/or faculty who

assign marks based on informal assessments of the students‟ performances while in the clinic. Essentially

students usually get 80% if they have attended the clinic component of the course. There is a variable

marking system with no associated algorithm or template for marking since it is difficult for housestaff and

faculty to get to know students for such a brief period of time.

The ophthalmology and ENT exams are done on the same day in a 1-hour period.

There is no determination of item statistics for any evaluation component. No student has ever failed the

OSCE but failure of this component would not necessarily mean failure of the course. Students who have

received <70% on 2 components of the course are brought to the attention of the Clerkship Director and

ultimately the Clerkship Committee.

Marks are posted on the ListServ with each student receiving a designation of HPF for each component and

an overall grade. Course and faculty evaluations are completed by the students at the time of the written

examination. One to 2 students request remarking of their written examination per year.

The overall class average for the course has remained relatively stable over a 5 year period with an

increasing percentage of students receiving an honours mark. Overall, 76% of students received honours in

the course in the 2002-2003 academic year. There were no failures in a 5 year period and an average mark

of 81-84. The clinical skills marks mirror the overall mark results with one failure in the 2000-2001 year.

No data on inter-rater reliability was presented. There did not appear to be a difference in averages marks

between the components for each rotation. There are no analyses of internal consistency. Again, the

comparison of marks between academies lists the average mark for the clinical skills component. There are

no data presented with respect to construct or predictive validity for the clinical skills component. Content

validity is assessed by the course director. The class average for the OSCE component is between 83 and 85

for the past three academic years with nearly 90% of students achieving honours in this component in 2002-

2003. There was a wide range of marks with the low mark ranging from 60-68. Comparison of marks

between academies is presented but again, the analysis has not been identified.

With respect to feedback, it is given by sending out marks on the ListServ. Students may go to the Course

Director if they wish to discuss their evaluation. Weak students are informed of their performance and are

encouraged to do an elective in the subject. There is no mid point feedback since the course is only 1 week

duration.

The appendix outlines correlation coefficients for the varying components. There is little to no correlation

between the clinical skills components and other evaluation methods used. There was significant correlation

between both the OSCE and written examination and the final mark. Histograms of the various components

demonstrate a fairly normal distribution pattern.

Areas of Strength:


The course has developed a secure database that presents 10-20% new questions per examination.

43

2. Component Evaluation

All component evaluations have a range of marks with a normal distribution on histograms.

3. Course Evaluation

There is a good method of attaining both faculty and course evaluation feedback.



There appears to be some concern from the students that the slides that are presented remain projected for a

short period of time only.

Recommendations:

The course director could pursue alternate methods of projection that may allow students longer access

to the slide.

The committee had some comments on the need for both MCQ and short answer questions.

Recommendations:

There is a need to determine the item statistics for this evaluation component. The Course committee

should also consider including more short answer questions.

2. Ward Assessment

Since the students generally receive 80% for “just showing up”, the ward assessment is of relatively limited

discriminatory value.

Recommendations

Make the clinical time a mandatory part of the course but not a component that receives a grade.

Increase the weighting of the clinical skills component by 10%

3. OSCE Marks

There is a very high percentage of students obtaining honours is this course. This may reflect the relatively

subjective ward assessment and the limited number of skills that can be presented in the OSCE component.

Since ophthalmology is a short course, it is difficult to develop a large selection of skills/maneuvers that can

be tested.

Recommendations

Develop standardized checklists or a marking template to ensure consistent marking between sites for

this component.

4. Feedback

Due to the brevity of the course, it is difficult to give feedback to students.

Recommendations

Some suggestions to consider would be: to include comments along with the ListServ distributed mark

which indicates whether the student‟s performance was below expectations, meets expectations or above

expectations, focused faculty development to emphasize the importance of daily informal oral feedback,

provide written comments on a tear off sheet at the end of the exam.

44



conclusion that:

Written submission regarding addition data analysis (Item total correlations for a sampling of written exams,

correlation coefficients between component marks between academies) and response to above

recommendations required ..........................................................................Interim Review in 6-months time

___________________________


___________________________ ____________________________

Catherine Birt, Course Director Date


ESAC File


UME-CC

45

Faculty of Medicine

University of Toronto


Examination and Student Assessment Committee (ESAC) Addendum to the Report on Ophthalmology

September 5 2006 This addendum completes the ESAC review of the student assessment system in Ophthalmology, a one-week rotation taken as part of the six-week block with Family and Community Medicine (four weeks) and ENT (one week) during the Phase 1 Clerkship. The Course Director for Ophthalmology, Dr. C. Birt, completed the CRICES form (i.e., Criteria for Review of Individual Course Evaluation Systems) and presented the course report at the ESAC meeting of November 4, 2003. A review for Ophthalmology was prepared by the lead reviewer, Dr. P. Morgan. The written examination question database is maintained and updated by members of the undergraduate course committee. A new examination is developed for each of the six rotation exams per year and the content validity of the exams is assessed by the course director. For each rotation exam, all papers are marked by a single marker. In the CRICES report, grade statistics were reported for the overall final grade and the three grade components. In the appendices were tables of correlation coefficients and breakdowns of mean component grades by rotation and teaching site. This addendum was prepared in response to the review recommendation to examine item statistics for the written exam. The Course Director and UMEO professional educator arranged for an analysis to be conducted on all exam papers for one rotation block selected at random. Normally the written exam results are recorded as aggregate scores without the individual item data. In ophthalmology, individual item data were not stored electronically in the written exam results database. To conduct an item analysis, the item data had to be retrieved from the original exam papers. Due to time and cost constraints, the analysis was limited to the exam for one rotation block that was randomly selected. A data file was created with the mark assigned per question for each of the 33 exam papers. The written examination includes three question formats: (i) three short answer questions based on projected slides (15 marks), (ii) five short answer questions without slides (25 marks), and (iii) ten multiple choice questions (10 marks). Each format was reviewed separately. In general, the analysis found that for each format the items ranged in difficulty and resulted in a large distribution of marks. For the short answer formats, item means ranged from 3.1 to 4.1 out of 5 (slide questions) and from 3.7 to 4.6 (no slides), and total scores correlated highly with the overall format score. For the multiple choice questions, item difficulty ranged from 27 to 100% and correlation coefficients for 7 items ranged from .17 to .61. Item statistics for each format are presented in Tables 1 to 3. Overall, each item format resulted in mean scores from 73 to 84 percent. Summary statistics at the level of each format are presented in Tables 4.

46

Our interpretation of this data is in the context of a small sample size and our intent is to examine for trends or general issues rather than to make specific conclusions. It appears that the item total correlations are acceptable. A relatively high proportion of the questions were answered correctly by 100% of students. This is not a problem when there are a large number of questions but when you use a relatively short examination this can affect final grades. We would not propose that you change the examination based on this limited review but rather we suggest you develop and maintain a database which will allow you to do this type of analysis on an ongoing basis. Resources are available to assist you with the development of such a database. Conclusion of Analysis Based on the CRICES report presented, the information contained in this addendum and the opinion of the ESAC committee members, the ESAC review of Ophthalmology was concluded with …. Continued improvement be encouraged .............................. Full review next cycle (approx. 3 years) ___________________________ Richard Pittini, ESAC Chair ___________________________ Date

47



Examination and Student Assessment Committee (ESAC) Otolaryngology Review: Paolo Campisi MD, Course Director

Review by: Dr. R. Pittini & Dr. S. Bernstein

Otolaryngology is a 1-week, course that shares a 6-week block with Ophthalmology (1 week) and Family

Medicine (4 weeks) in Phase I of the clerkship.

The course director, Dr. P. Campisi presented the review of the Ophthalmology course on September 4,

2007. On the 2nd

Friday of the 2-week Otolaryngology/Ophthalmology rotation, students take an

examination. The ophthalmology and otolaryngology exams occur on the same day.

Students are evaluated with a short answer written examination, a two station OSCE examination and an

ambulatory assessment based on direct observation by faculty during six half-day clinics.

Components

The written examination is composed of 20 short answer questions worth 60% of the final grade. The

Otolaryngology committee members create the secure examination database that was last updated two-five

years ago. The course director who assesses the face validity of the questions reviews all questions. One

member of the committee marks all of the examinations for one iteration of the exam per year while the

course director marks two iterations.

The OSCE is comprised of 2 stations, each ten minutes in duration worth a total of 20% of the final grade.

A single examiner examines students for both stations. The students are examined at the same site that they

receive their instruction. Examiners use a detailed checklist to evaluate student performance which the

students may demonstrate or describe. Some examinations include models while others include a

description of what would be done if there were a patient. The OSCE examiner is the same individual who

completes the ambulatory assessment. The OSCE marks are consistently higher than the written exam

marks.

Ambulatory assessments form 20% of the final mark. Site coordinators assign marks based on informal

assessments of the students‟ performances while in the clinic. Students attend six half-day clinics.

Evaluation information for each clinic is collected on paper based „green cards. The site coordinators do not

always have the opportunity to observe students directly and rely on comments from other teaching faculty.

48

According to the course director, students essentially get 80% if they have attended the clinic component of

the course. Prior to the ambulatory assessment the OSCE is completed.

The overall class average for the course has remained relatively stable over a three-year period with an

increasing percentage of students receiving an honours mark. Overall, approximately 50% of students

received honours in the course. There were no failures in a three year period. No data on inter-rater

reliability was presented. There did appear to be a difference in averages marks between the individual

rotations for year 2006-2007. There are no analyses of internal consistency. Content validity is assessed by

the course director.

With respect to feedback, students may petition the course director through the Undergraduate Medical

Education office. Weak students are informed of their performance and are encouraged to do an additional

project in the subject. There is no mid point feedback since the course is only 1 week duration. A midpoint

quiz with provision of correct responses is being considered as a feasible method of providing formative

feedback.

Observations:

The number of questions included in the written examination is small relative to the weighting given to the

overall exam. The examination pool needs to be expanded with new questions added on a more regular

basis. Each iteration of the examination should include the same proportion of new questions. Some of the

examination questions are complex and could be simplified into several separate questions. This is an easy

method of increasing the exam pool size.

The examination scores appear to vary according to who marks them. The inconsistency in written

examination scores across different rotations reached statistical significant according to the one-way

ANOVA for the sample year 2006-2007. This may be the result of differences between evaluators and can

be avoided by dividing the examination into sections and have one marker evaluate all questions in one

section. Avoid having the same examiner for all three components and identify students by student number

only for the purpose of marking.

While additional OSCE stations may be beneficial from a psychometric perspective, feasibility constraints

make the choice of two stations reasonable. Given the small number of stations, the quality of each station

and its evaluation is more critical. The ideal station is standardized with respect to exam content (e.g. use

of mannequins) and examiner. The OSCE as a performance-based evaluation requires that skills be

demonstrated and not simply described. Ensure equal access to the mannequin to be used for evaluation

such that either all or no students have access to it.

The standardized forms define what criteria are to be used to assess students. Site coordinators can

compile the green cards to generate a global evaluation that reflects the opinions of the faculty who directly

observe the students. Faculty need to be discouraged from assigning marks based on attendance alone.

Faculty development should emphasize that the criteria outlined on the „green cards‟ be used consistently

for all students at all sites.

The completion of the OSCE evaluation prior to the assignment of an ambulatory mark by the same

individual can potentially lead to skewing of ambulatory marks based on OSCE performance. Using faculty

from different sites to mark OSCEs would eliminate this problem (e.g. have students from St. Michael‟s go

to HSC for their OSCE).

49

The brief duration of this course creates a challenge in providing students with meaningful feedback. In

addition to the proposed innovative feedback method, e-Log may provide another form of objective

feedback. E-Log may also be useful in determining which experiences are common to all students, these

and only these should be included in the evaluation. Encounter cards should be reviewed in a systematic

fashion in order to provide students with formative feedback from their tutors.

Areas of Strength:

3. The written examination is well aligned with the course objectives and the topics are evaluated

proportionately to their coverage in the course. Face validity appears to be good.

4. The OSCE provides an opportunity to objectively evaluate student performance in a procedural-

based rotation

5. The innovative method of feedback proposed (midpoint mini-quiz) is supported by this committee

Recommendations:

1. The written examination

a. The written examination pool needs to be expanded

i. Add 10-15% new questions per year

ii. Split existing complex questions into simpler components

b. Each examination should be divided into sections that are marked by one faculty

2. OSCE

a. Standardize OSCE across all sites so that mannequins are used by students to demonstrate

skills rather than describe procedures

b. Consider inclusion of three categories on the OSCE checklist (not done, done incorrectly,

done correctly).

c. Consider using one of several available techniques for standard setting among your

evaluators

3. Ward assessments

a. Adjust timing of the completion of this form to ensure that it occurs independent from the

OSCE evaluation

b. Encourage the use of specific criteria when evaluating students

i. Continue to use „green‟ encounter cards

ii. Provide faculty development aimed at this



conclusion that:

50

No major issues, ongoing improvements encouraged ................................Full review in three years time

___________________________


___________________________ ____________________________

Paoli Campisi, Course Director Date


ESAC File


UME-CC

51


Review of

Paediatrics Clerkship (April 29, 2004)

Background / Context

The paediatrics clerkship is a 6 week rotation in the third year of the undergraduate medical curriculum. As with other

clerkships in the third year, approximately 30-35 students rotate through the clerkship in each of 6 rotations over the

year.

For the last two years, two streams have existed within the rotation, with approximately half of the students in each

rotation participating in each stream. In one stream, the entire 6 week rotation is spent in the context of a single

community setting. In the other stream, the students spend 3 weeks in the Hospital for Sick Children and 3 weeks in a

community setting. For this second stream, half the students are at HSC for the first three weeks and half of the

students are in the community setting for the first three weeks. Other than this systematic variety in clinical setting,

the program for all students is similar and involves one day a week devoted to an academic teaching program at The

Hospital for Sick Children, which includes a half day for a seminar series, and a half day for case-based rounds.

Three forms of evaluation are used to generate the final rotation mark: a written test (worth 40% of the final mark), a

short written assignment related to Project CREATE (worth 10%) and a clinical mark based on performance in the

clinical settings (worth 50% of the final mark). The score generated from this weighted sum is translated into

honours/pass/fail for the purposes of transcription. In addition, it is necessary to pass both aspects of the evaluation

system in order to pass the rotation. There is also a pass/fail requirement to perform a physical examination.

The written examination is administered to all clerks at the end of each 6-week block. This examination is composed

of xx short answer questions. Each question per rotation is marked by a single examiner from a pre-existing marking

template to reduce examiner error in interpretation. No inter-rater reliability or internal consistency statistics are

currently available to assess the reliability of the marking of questions. Inter-rater reliability assessment would require

that two examiners score the same questions, which may be difficult given the constraints on faculty resources.

However, internal consistency measures should be relatively easy to calculate and are strongly recommended. Student

feedback regarding this aspect of the evaluation system was generally quite positive. Reports from the students

suggest that the written exam is perceived to be of appropriate difficulty and representative of the course objectives

and course content. Wording of questions was overall very fair and of good quality, and while some questions were

seen to be repetitive from previous years, students felt that this did not impair their evaluation.

The short written assignment is generally marked by one of the student‟s clinical supervisors. No inter-rater reliability

analyses are currently available to assess the consistency with which these assignments are marked. However, scores

are generally quite high with little variance, and are worth relatively little in the total score of the students, so

reliability assessment of these scores should be considered a relatively low priority. Student feedback suggests that,

although a sample was provided, the evaluation guidelines for this assignment were somewhat unclear. Some effort to

clarify these guidelines to the students (perhaps by providing the students with the guidelines given to the faculty)

might be helpful.

The clinical mark is generated by the team of supervising clinical faculty at the end of each clinical experience and is

based on daily interactions between students and faculty in the clinical context. For students who are in the single

community setting stream, this results in a single evaluation form being completed that constitutes the student‟s

clinical mark for the rotation. For students in the two-setting stream, a form is completed at the end of each 3-week

rotation and the clinical mark for the rotation is calculated as an unweighted average of the two forms. The correlation

between the two marks generated for the students with two placements is generally quite low (ranging from .08 and

.34 in the thee years under evaluation). Students have reported a general feeling among the student body that that the

stream including a rotation at HSC may be disadvantageous because the HSC clinical experience is “more difficult

52

than the community setting” and one‟s “clinical mark will suffer.” There is certainly no evidence of this phenomenon

in the data from the last two years. In fact, no substantial difference appears to arise in the scores of students who

participate in one stream or the other on either the clinical marks (where the marks were 81.67 in the community

stream vs. 81.76 for the mixed stream in 01/02 and were 81.60 vs. 82.45 respectively in 02/03) or written marks

(79.14 vs. 78.16 and 77.21 vs. 77.94 for the two years). However, the perception exists and mechanisms might be

enacted to counter these “rumours”. (Note: there does appear to have been a difference in the 00/01 year, however, in

is in the opposite direction to that suggested by the perceptions of the student body, and there were very few students

in the community stream in that year, limiting the capacity to make reasonable generalizations). Students also reported

some concern regarding the lack of explicit feedback from the faculty regarding their progress in the clinical setting

over the course of the rotation. This concern was exacerbated in the community setting where no formal mid-rotation

evaluation is provided, which “made the ward evaluation seem unfair (and not as good a learning tool, because can't

improve when only evaluated at the end).” Increasing the level of informal daily feedback from preceptors should

certainly be encouraged, however it might also be possible to enforce a formal evaluation in the 6-week stream to

mirror the mid-rotation evaluation that is an institutionalized aspect of the 3/3-week stream.

Students generally reported being quite happy with the requirement of an observed history and physical, and were

happy that this was evaluated as credit/non-credit since some supervisors were felt to be far more strict than others in

the implementation of this requirement (for example, some students were observed from start-finish of a patient

history/physical, whereas some students are not supervised but just questioned about their history/physical when they

were finished). Given the perceived lack of standardization around this component of the evaluation, it was felt that

credit/no-credit was an appropriate evaluation mechanism.

Strengths of the Evaluation System

The use of multiple sources of evaluation to generate a mark for the student is appropriate and consistent with the

intent of the University and the recommendations of ESAC. The weighting of the ward mark (at 50%) is somewhat

higher than the generally recommended value suggested by ESAC (40%), especially given the that the relatively

low correlation across placements confirms the common finding that these marks are quite unstable, but the

weighting is not unreasonably large.

The students appear generally to be quite happy with the mechanisms of the evaluation system, viewing it as

largely reasonable and fair from the perspective of the scoring and weighting of various aspects of the evaluation

process.

Areas for Potential Concern or Improvement in the Evaluation System

There is relatively little psychometric analysis of the marks generated for the students. While there are clearly

efforts at quality assurance in the production and marking of the written examination, statistical analysis is strongly

recommended in the form of inter-rater reliability and internal consistency measures in order to ensure the quality of

the examination.

There appears to be relatively few opportunities for explicit feedback on the written examination. This is, in part, a

natural consequence of the “closed” nature of the examination, since particular answers to particular questions

cannot be discussed extensively without compromising the validity of future examinations. However, this concern

has been further exacerbated by the current implementation of the honours/pass/fail transcription system, such that

students now also get only a very global sense of their level of performance on the examination. Efforts to increase

the substance of the feedback to students regarding their knowledge level and areas of potential concern are strongly

encouraged. For example, if questions can be sorted into three or four content areas, then areas of strength and

weakness in the individual student‟s knowledge base might feasibly be discussed. In addition, one course provides a

tearoff sheet at the end of the examination that allows the marker to provide some short written feedback to the

students about areas of strength and weakness and particular misconceptions. These types of comments do not need

to be directly related to the questions (thereby protecting the closed nature of the examination), but could

nonetheless provide useful personal information for the students. These are just a few suggestions, and ESAC would

encourage the course committee to consider developing additional methods as well.

53

Similarly, there are some concerns raised by the students regarding opportunities for formal feedback in the clinical

rotations, especially in the 6-week community stream where there is no 3-week evaluation. The course committee

might consider a mechanism for a formal mid-rotation evaluation to match that generated in the 3/3 stream, and to

encourage daily feedback to students by the preceptors.

Recommendations

Further analyses examining the inter-rater reliability of the clinical evaluation forms across the two clinical settings

for those who are in the split stream would provide interesting information about the reliability of the clinical mark

that is generated as 50% of the students‟ marks. It is recommended that these analyses be performed. Members of

ESAC are available for consultation on this recommendation if the course committee feels that such consultation

would be helpful.

Analyses regarding the internal consistency (eg, Cronbach‟s alpha) of the written examination are strongly

recommended. The identification of questions that are clearly not psychometrically sound could be given special

attention as part of the examination review process. Also, some effort at inter-rater reliability for the marking of the

written examinations would be recommended.

Special attention should be paid to increasing the amount of feedback to students regarding their performance on the

written examination and clinical performance. Recognizing the closed nature of the written exam and the current

interpretation of the policy regarding the release of numeric grades, it is nonetheless important to give students

feedback regarding their relative strengths and weaknesses in the knowledge domain as well as in the clinical

domain.

Mechanisms might profitably be implemented to provide information to students regarding the relative equivalence

of the two streams of clinical training in order to ameliorate concerns regarding disparate marking practices.

Rating

No serious problems are evident and no anticipation of serious problems developing in the near future. Next review in

3-4 years.

Note

This evaluation is forwarded to:

Chair of the Undergraduate Medical Education Curriculum Committee

(and Associate Dean Undergraduate Education)

Chair of the Clerkship or Preclerkship Committee

Course Director

and kept on file by ESAC.

____________________________________________________________

Signature of Course Director

____________________________________________________________

Signature of Chair, Examination and Student Assessment Committee

Thank you for participating in the ESAC Review Process!

54



Examination and Student Assessment Committee (ESAC) Pathobiology of Disease (PBD 211F)

Course Director: William Chapman Report: January 6, 2004 (original date) with revisions Nov. 7, 2006 Lead Reviewers: R. Gupta & R. Pittini Preamble The Pathobiology of Disease (PBD) report was presented to ESAC by the Course Director, Dr. William Chapman on January 6, 2004. The reviewers wish to acknowledge the long delay in finalizing this report. Recommendations to the Course Director do not take into consideration any changes to the course that may have been made in the interim. Background This course is a 14-week course which starts at the beginning of second year and it is the heaviest caseload in second year at that time. The course consists of 14 problem-based learning sessions, one per week, and 9 seminars in microbiology/immunology/genetics. The objective of the course is to bridge the basic and clinical sciences. The three examinations during this course are evenly spaced and equally weighted. The first examination is a 50 item of multiple choice question examination. The second examination is a combination of short answer and multiple choice question examination. There are 34 MCQ items worth one mark each and 9 short answer questions worth 26 marks. Eight of the short answer questions are genetics based and one is a question regarding an ethical issue. Examination 3 is a 60 item multiple choice question examination. The material tested by each examination is not cumulative. Students regularly make presentations during the course, but the tutor does not evaluate performance on these presentations. The examinations are set by the Course Director and questions are obtained from the lecturers. New questions are requested each year and there are minor instances of repeat questions. Students have access to previous year’s questions as examination question booklets are not collected at the end of each examination. The exam items are informally reviewed both prior to

55

and following the examination. Approximately two questions are deleted each exam based on student feedback. Overall, students perform very well on this course with a class average of 84%, 83% and 86% in the years 2003, 2002 and 2001, respectively. The proportion obtaining honours is 81%, 84% and 89% in those years respectively. There has been no failure in the last three years. The course director believes that the grades reflect a high proportion of students attaining a mastery of the content, i.e., better than most practicing physicians. The course director believes that the evaluations are very representative of what students need to know. Data on internal consistency of the multiple choice question examinations is available but was not presented. Short answer examination questions are marked by one individual and therefore assessments of reliability are not pertinent. There is a presumed link between the objectives and the course evaluations although this is not formally mapped. There is an opportunity to assess the predictive validity of these examinations by correlating marks in this course with those in Foundations of Medical Practice. Feedback is limited to providing students with examination answers within a few days of writing an examination and a final mark shortly after the course has been completed. There is no formal interim review of progress. Remediation for students scoring between 60% and 70% is not mandatory but they are invited to meet with the Course Director. Students who score less than 60% are required to meet with the Course Director. Students who are invited to meet with the director do show minimal improvement during the course but many continue to experience difficulties. According to the Course Director, informally, these students often have had difficulty in other pre-clerkship courses. The nature of the remediation is advice regarding studying and learning and sometimes they are re-evaluated with a take home assignment or oral examination. Areas of Strength The students were very happy with this course. The evaluations are felt to be fair with a moderate level of difficulty. The students feel the exams were reflective of the material taught. Areas for Improvement Multiple Choice Question Examinations: Internal consistency should be calculated for the MCQ items in the examinations. Further data is needed. High proportion of Honours: The high proportion of honours may reflect a high proportion of students mastering content but it may also mean that the questions are too homogeneous and do not discriminate amongst weak, average and outstanding students. Looking at the distribution of marks would be very helpful. Clearly, it is difficult to discriminate weak from minimally competent students when all the marks are high. There is difficulty in distinguishing between assessment of knowledge and applied knowledge. There are many presentations done during the PBL sessions which have gone unevaluated.

56

Recommendations 1. A supplemental report is requested and should provide the committee with more data and an accompanying analysis. A psychometrician (K. MacRury) should be consulted. The supplemental report should consist of a histogram of the mark distributions. Descriptive statistics and measures of internal consistency should be provided for each of written examinations. With your agreement the necessary data can be collected and analyzed on your behalf. It is recommended that the data included in your original report be supplemented with data from the subsequent two years. 2. Consideration should be made to increasing the range of difficulty of the examination questions to better discriminate the weak from the minimally competent student. 3. All methods of evaluation should be considered for the PBL portion of the course. The course director should consider formally evaluating some of the presentations which could be evaluated in written or oral format using a template such as the one used in DOCH. Furthermore, part of the presentations could be linked to CanMEDS 2005 roles. Conclusion of Review Based on the CRICES report presented and the opinion of ESAC committee members, it is our conclusion that: The final review of this course will be issued pending the supplementary report. Respectfully submitted ____________________________ R. Gupta, MD, FRCPC, Med ____________________________ R. Pittini, MD, Med, FRSCS (Chair)

57



Examination and Student Assessment Committee (ESAC) Review of Psychiatry Clerkship - PSS 330

Reviewers: Dr. Richard Pittini & Ms. Michelle Porepa

Preamble

Dr. Lofchy with the assistance of Tina Martimianakis presented her report to the ESAC committee on April 6, 2004.

A thirty-two page completed CRICES form was distributed and reviewed by committee members prior to the

presentation. Two student representatives from the course as well as student members on the ESAC committee were

in attendance for the presentation and were interviewed separately following the departure of Dr. Lofchy and Ms.

Martimianakis. Dr. Lofchy has been the course director since 2000-2001. Data for the last four academic years was

reviewed.

Course Background

The psychiatry clerkship rotation consists of a six-week block in the third year of the medical curriculum. Five

university-affiliated hospitals are involved in the teaching and evaluation of students. Some students are assigned to

two three-week blocks divided between two sites. Students are given a choice of which teaching site they would

prefer to attend.

Students receive a large amount of direct supervision during this clerkship rotation. There are numerous observed

interviews. Given the nature of the rotation, physical examination and technical skills are not evaluated. Summative

evaluation occurs in the final week of the rotation with the exception of the first case write-up, which is marked at the

midpoint.

The evaluation system consists of four components -- a ward assessment (40%), a written Examination (20%), an

OSCE (25%), and two case write-ups (15%). Students are required to pass each component of the evaluation system,

however final decisions regarding whether a student passes the course are at the discretion of the course director.

The class average has been stable at approximately 81% since the current director was appointed. The proportion of

honours rose from 60% to 75% over the last four years reviewed. These trends are consistent for all components of

the student evaluations utilized in this course.

Ward Assessment

The ward assessment is completed at the end of the clinical rotation. Ward evaluations are completed either by

individual preceptors or by consensus. Students who are assigned to two-sites are evaluated at the end of the six-week

rotation by their primary supervisor. There is some variability in how these evaluations are completed, as there is no

clear protocol outlining how they are to be completed. The ward evaluations account for 40% of the total mark and

average approximately 85% with almost ninety percent of students receiving honours. There is some variation in the

58

ward marks across academic sites but this appears to be diminishing. There has been no formal evaluation of the

variance due to teaching site.

While the ward evaluations correlate best with the case write-ups as might be expected, this correlation remains

relatively weak at 0.40. Both the case write-up and the ward evaluation should reflect the students‟ capabilities on the

ward without the influence of the time constraints that are imposed in an OSCE evaluation.

The ward evaluations that have been presented here were completed using course specific checklists. These forms

have recently been changed to reflect the new institutional objectives. The impact of this change on the ward

evaluations assigned by supervisors will need to be monitored.

Written Examinations

The written examination component of the course is worth 20% of the grade and it is very similar to an OSCE but

with an emphasis on assessment of knowledge. Students tend to do less well on this component with a class average

of just under 80% and only 50% of students receiving honours. There has been a relatively large amount of variation

in scores within a given year depending on the rotation (> 2SD in the 2000-2001 academic year). While this could be

due to the random assignment of students with different capabilities it might also reflect variations in the difficult of

the examination. The alpha coefficients have also been sub-optimal, even negative for some years. This trend seems

to be diminishing. The explanation for this offered by the course director seems to be very reasonable in that it was

associated with several changes to the curriculum including the integration of new objectives. There was an increase

in the number of new examination questions above the usual 20%, which may have contributed to some of the

variability seen.

The data presented regarding examination reliability was of interest despite the lack of quantity. The exam appeals

process appears to be an appropriate means of dealing with contested examination results but it also introduces a

mechanism for ensuring all examination are reliable.

OSCE

The OSCE accounts for 25% of the student grade and consists of five stations, each of fifteen minutes duration. Each

station‟s evaluation consists of a content score, a process score and a global assessment. Raw scores are converted

using an elaborate translation scheme using a „borderline groups method‟ for determining the appropriate cut-point for

a pass. This mechanism allows for adjustment of scores according to the difficulty of the examination stations used.

Both students and the course director note some stations to be quite difficult but the effect that this has on test scores

is adjusted for during the mark translation. Stations are developed by a committee and involve standardized patients.

Care is taken to field test new OSCE stations before they are implemented. Students must obtain an overall score of

60% to pass and must score borderline or better on at least three of the five stations.

The OSCE marks are the lowest marks consistently among the various components. The marks tend to improve

throughout the academic year suggesting that there is benefit to having the rotation later in the year but this has not

been formally evaluated. The alpha coefficients for these examinations were similar to the written examinations. The

explanation for which is that there were many curriculum changes and that many new questions were introduced to

address the recently adopted objectives. There was variation in scores across academies despite the centralization of

the examination. The magnitude of this variation may or may not be significant but it does raise concerns over the

uniformity of experience students are receiving.

OSCE scores tend to correlate poorly with the case write-up (0.17). This is unexpected as both evaluations are aimed

at assessing the students‟ ability to conduct a focused interview. Two distinguishing features of the OSCE are that

they are timed interactions and that the „patients‟ are trained role players rather than actual patients. The OSCE‟s are

standardized and as such are given more weight in the student overall mark.

59

Case Write-ups (CPP)

Students are required to submit two case write-ups. One report is a preliminary report and the second is a final or

progress report. The case write-ups are worth 15% of the final grade and are evaluated according to a template.

Students have access to the template in order to guide them. Student scores tend to be high with very little variation

(5%) raising concerns that students are receiving a grade for having completed the task rather than having the quality

of their submission scrutinized. While the evaluation forms have been assembled with great care, assessing nine to

ten competencies using behaviour anchored rating scales, it is not certain whether those evaluating the students are

properly utilizing these guidelines. Some sites have assigned honours to all but 8 of 140 students over the last four

academic years.

The case write-ups can be marked either in a blinded fashion or by the primary supervisor who is also responsible for

the ward evaluation. This may partially account for the relatively high correlation between CPP and ward evaluations

(.40).

Feedback Mechanisms

Three of the four evaluation components occur at the conclusion of the rotation and as such afford only summative

feedback. This is provided to students in written format and is valued by students as it is based on direct observation.

While there is an opportunity to receive formative feedback at the midpoint this is not a formal component of the

course. Students who are assigned to two sites for their six week rotation do not receive an evaluation from the first

site prior to their move to the second site. Student do receive a formal midpoint evaluation as the CPP Part 1 Case

Report Summary Evaluation.

Students rate their own evaluation as appropriate consistently over the last six academic years giving an average

rating of 3.78/5. They feel even stronger that they receive timely and helpful feedback given the course an average

rating of 3.97/5.

The course CRICES report outlines several examples of student comments and the appropriate actions taken to

address these concerns over the last several years.

Areas of Strength

1. The variety of evaluation modalities utilized and the appropriate matching of evaluation method to that, which

was being assessed.

2. The amount of direct observation of students

3. The sophisticated post-hoc adjustment of examination scores to compensate for variations in station difficulty

4. The use of a committee to develop, test and review examinations

5. Instruction for evaluators and provision of a template regarding the CPP

6. The provision of timely written feedback following written/OSCE stations

7. Appropriate examination quality monitoring (e.g. alpha coefficients) with thoughtful interpretation of results

and reasonable explanations for deficits

Areas for improvement

1. There is a lack of formal evaluation of the effect of two-site versus single site and the impact of academic site

on student evaluations.

Consider analysis of variance to determine how much of the variance in student scores is attributable to academic

site. Compare scores between two-site students and single site students to determine if there is a significant effect

of site assignment. Should there be significant differences it will remain to be determined whether this is a result

of variations in teaching or evaluation.

60

2. There is perception of inconsistency in how ward evaluations are completed, especially for students assigned

to two sites

It is important to clarify the way in which final ward evaluations are derived. That is, if a student has two

supervisors, should both the first and second three week evaluations be weighted equally, or should the final mark

reflect the students, ultimate performance at the end of the rotation (thus weighting the second evaluation more

strongly)? Furthermore, it would be important to have uniformity across sites in terms of who has input into final

ward evaluations (nurses, residents, staff supervisors, consensus, weighting of input).

3. While the post-hoc translation of the OSCE marks adjusts for station difficulty this may not address the

impact of difficult questions on subsequent student performance. Students perceive some stations as too

difficult and some of the SP as unrealistic.

Consider evaluating the impact of examination order to determine whether difficult stations impact students‟

subsequent performance. This could be achieved by comparing students who do their written exams prior to the

OSCE with those who do it after. A more elaborate review of the data could examine the impact of OSCE station

order on the overall performance – do students do better on an „easy‟ station if it occurs before or after a

„difficult‟ station? Is there a link between SP realism and station difficulty? Collect data from students on SP

realism and perceived difficulty.

4. Lack of formative feedback at midpoint

Midpoint feedback to discuss progress to date is often overlooked or not completed. Perhaps more formative

feedback re: global performance could be linked to CPP evaluation. (CPP feedback inevitably occurs at this time

due to specific midcourse deadlines.). Ongoing evaluation of ward performance could be achieved through

completion of encounter forms. These forms could be the basis for formative written and verbal feedback at the

midpoint and could be forwarded to the final evaluator especially for students assigned to two-sites.

5. Lack of correlation between CPP and ward evaluations, inflation of CPP marks

Consider implementing a very specific guideline for how both of these components are completed so that they are

completed consistently across all sites regardless of whether it is a two-site or single site assignment.

Re-marking a sample of the CPP assignments in a fashion similar to the student appeals process utilized for

the written examination would allow you to determine if there is a bias effect, second markers from different

sites, primary supervisor versus blinded marker.

Ongoing faculty training to encourage adherence to marking templates for both ward evaluations and CPP

6. Lack of a standardized approach for the application of course director discretion regarding final pass/fail

decisions. This may pose difficulties at the time of the next director changeover and may account for some of

the variation seen at the time of the last changeover (1999-2000).

Consider setting a policy/guideline for how these decisions will be made, conditions where this might apply, the

range of this discretion, a means for quantifying how often it is required and a method for assessing how effective

a mechanism it is.

Recommendations

1. Seek psychometric expertise to complete the above-suggested analyses prior to the next course review.

2. Develop a formal formative feedback session at the midpoint of the rotation.

61

3. Correlate course written exam scores with MCCQE Part 1 psychiatry scores in order to determine the

predictive validity of the written examination.




___________________________


___________________________ ____________________________

Jodi Lofchy, Course Director Date

62



Examination and Student Assessment Committee (ESAC) Course Review: Structure and Function, STF111F

Course Director: Ian Taylor

Reviewers: Raj Gupta, Dominik Podbielski, Nicolae Petrescu

The CRICES report was presented to the committee on December 7, 2006 by Dr. Ian Taylor

Background

Structure and function is a 20-week course occurring in the first five months of the first year. There are a

total of 636 hours in the course. The percentage of course hours spent in lecture is 32%, 20% in the

laboratory, 9% in seminars and tutorials and 37% is allotted for study time. The objective of the course is to

provide a clinically relevant foundation in basic sciences.

Examinations Overview

There are a total of seven examinations in the course. Examinations 1 and 2 occur on the same day.

Examination 1 is a short answer, performance-based examination, in histology. The images are computer-

based. Examination 2 is a performance-based examination in anatomy, radiology and embryology. There

are 180 markable items including specimens, x-rays and clinical scenarios. Examination 1 is worth 4% of

the final mark and Examination 2 is worth 20% of the final mark. Examination 3 is worth 10% and occurs

about two weeks after Examinations 1 and 2. This is a written MCQ examination in embryology. There is a

take home component of this examination, worth 1.2% of the final grade, which is an essay on an ethical

issue. Examination 4 is a performance-based examination in anatomy, radiology and embryology, and takes

place approximately two weeks after the last examination. It includes 180 markable items including

specimens, x-rays and clinical scenarios. In mid-December, students undergo Examination 5, which is

“Integrated Exam 1” based on weeks 11-16 (examination material is not accumulative). It is worth 23% of

the final mark and is a 70 item MCQ examination. Examination 6 occurs in the New Year, and it is a

performance-based examination in histology with computer-based images, and is worth 5%. One day later,

Integrated Examination 2 (Examination 7) takes place. It is comprised of short answer questions worth 30

marks and 40 MCQs.

Examination Development

Examinations 1 and 7 are devised by the principle lecturer in histology. Exams 2 and 4 (gross anatomy,

radiology and embryology) are devised by a group of 9 people and are reviewed by the Course Director.

For Examination 3, there are two MCQ questions developed out of each of the 22 lectures. The questions

63

are created equally by the two lecturers in embryology. The ethics essay scenario is created by the Ethics

Coordinator. Examinations 5 and 6 are devised by a variety of lecturers in the various disciplines that make

up Section B of Structure and Function.

Three years worth of Exams 2, 3 and 4 are given to the students at the start of the course. For Section B,

the students received only the most recent versions of the exams due to a recent change in teaching

personnel. Questions may be repeated over the years. Each examination was reviewed by at least one

individual, except for Examinations 5 and 7.

Performance Standards

Students require a minimum overall average of 60% to pass the course. Students must pass both Section A

and Section B to pass the course. Students were also assessed on professional behaviour but it is unclear if

and how lapses in professionalism are documented or managed. Students receiving a mark of less than 60%

on any examination are invited for an interview with the Course Director. Some students who receive a

mark between 60% and 70% are also invited for an interview. Students often self-refer when their

performance is weak. Weak students are also discussed at monthly pre-clerkship meetings. Students are

given remediation that is individually tailored. Remediation success is determined objectively via written or

oral examinations given by one or more examiners

Examination Statistics

Overall

Over the last three years, the class average has remained between 80% and 82%. The range is 62% to 94%.

Last year, 59% received honours, a decrease from 2003/2004 when 70% of the class received honours. No

students have officially failed the course, but some have taken leave for personal reasons and others have

remediated prior to Board of Examiners meetings.

Examination 1:

Over the last two years the class average has increased from 73% to 79%. The proportion failing has

dropped from 10% to 5%, the proportion receiving honours has increased from 31% to 56%.

Examination 2:

The class average has decreased over the last three years from 81% to 75%. The proportion failing has

increased from 2% to 6% and the proportion receiving honours has dropped from 58% to 30%. The Course

Director feels that students are performing less strongly because of weaker student performance as opposed

to more difficult examinations. Students of the Biomedical Communications degree program form a control

group with which the medical students can be compared, and the course director reports that these students

have achieved more stable grades over the last three years.

64

Examination 3:

The class average has remained stable over the last three years, and it ranges from 82% to 85%. The

proportion failing is between 0% and 1.5%. The proportion receiving honours is down somewhat this last

year, at 63%, from 76% in 2003/2004.

Examination 4:

The class average in 2005/2006 is 83% which is a marked increase from 75% received in Examination 2.

The proportion failing has dropped dramatically to 0% in the last academic year. There is marked

improvement in performance in gross anatomy compared with Examination 2. This improvement is thought

to be due to improved performance amongst students who do poorly in Examination 2.

Examination 5:

The class average has dropped from 79% in 2004/2005 to 74% 2005/2006. The proportion of student

honours has also dropped from 50% to 32%. The proportion failing is now at 8% compared with 2.5% in

the year prior. The Course Director feels that this is due to weak performance particularly in biochemistry.

The Course Director notes that subsequently increasing the proportion of class time in biochemistry has led

to a better performance in the integrated examination in January.

Examination 6:

In the second integrated examination, the class average is 83%, up from 74% in the first integrated

examination six weeks prior. The proportion failing is only 0.5% this past academic year compared with

4.1% in 2003/2004.

Examination 7:

The examination statistics are relatively stable. The class average ranged from 81% to 84% in the last three

years with the proportion failing ranging from 2.5% to 5%.

Reliability Of The Examinations

Unfortunately, there is no information regarding inter-reliability, however, one person marks a particular

portion of an examination. Cronbach‟s Alpha for Examinations 3, 5 and 7 (MCQ‟s) range from 0.64 to 0.79.

Validity Of The Examinations

Gross anatomy examination testing time is directly proportional to the time spent in class in each area.

Further evidence presented informally is that students who do well overall, have done well in all sections of

the course and vice versa. The Course Director believes that students who have no biochemistry or

histology in their background do have difficulties with these parts of the course. Students who do poorly in

structure and function also do poorly in other courses. The majority of failures occur in a small cohort of

students

65

Feedback

Examinations are returned to the students within one working week of the examination. The final

examination in January is not returned, but students have the opportunity to review their paper with the

principal lecturer. The ethics essay is returned to students. It includes comments and the template for

marking. The Ethics Coordinator interviews every student whose essay is below the accepted standard. The

marking seen for the MCQ examinations are posted so that students can compare their performance with

that of the template. There is no formal mid-term feedback but students are aware that they may go to their

tutors, principle lecturers or course directors for further discussions of their evaluations.

In the past, the Course Director has made many changes to the course based on student feedback. In

particular, the Course Director has remedied the deficits identified in the May 2000 ESAC review.

Areas Of Strength

Students are generally very satisfied with the evaluations used in the course. The evaluations are felt to be

fair with a moderate level of difficulty. Examination results are generally stable over the years, and where

they are not, the course director has explained the phenomenon and made alterations to the course to

improve areas of weak performance. MCQ examination statistics are good. Feedback on MCQ

performance by posting the marking template is a strength.

Areas For Improvement & Recommendations

Consideration should be given to increasing the weight of the ethics examination to reflect the time spent on

the assignment. Consideration should also be given to altering the schedule of the evaluations to more

evenly distribute the amount of material per examination.

Conclusion


conclusion that:

Continued improvement be encouraged ............... Full review next cycle (approx. 3 years)

___________________________


___________________________

Date

66

October 28, 2002


Phase ll Surgery Clerkship

Reviewer: Anita Rachlis

The Phase ll Surgery Clerkship CRICES Report was presented to ESAC on June 11, 2002 by the Course

Director, Dr. Ted Ross and Education Consultant, Dr. Stan Hamstra. The course is a 6-week block taken as

three 2-week blocks chosen by the student from the specialties of general surgery, orthopedic surgery,

neurosurgery, urology, plastic surgery, cardiovascular surgery, vascular surgery, thoracic surgery, pediatric

surgery and transplantation. The educational experience includes both central and hospital based seminars

(two hours per week). The central seminar program consists of 4 key topics: trauma, cardiovascular surgery,

neurosurgery and pediatric surgery. The clinical clerk admits and follows patients and attends appropriate

operating room and ambulatory clinics of his/her assigned staff surgeon for each two-week rotation. At the

time of this CRICES report the evaluation consisted of 3 components: ward assessment, written MCQ

examination and OSCE at the end of the combined Medicine/Surgery 12-week rotation (currently at the end

of each 6-week rotation). A passing grade in the ward assessment is required before the student is permitted

to sit the written examination. The grade is determined as follows: a combination of the surgical

components of the written (MCQ) and the OSCE will constitute the „factual average‟. Provided the „factual

average‟ is a passing grade, the final surgery grade is the average of the ward (1/3), written (1/3) and OSCE

(1/3) grade. If the „factual average‟ is not a passing grade, it stands as the final grade.

The evaluation of the students is based on several different methods: a descriptive clinical performance

(ward assessment) using the forms provided for each of the clerkships but with descriptors specific for

surgery, and two objective examinations, an MCQ examination and OSCE. Students receive feedback at the

end of each 2-week block and informal mid-rotation feedback.

Data is provided for the academic years 1998-2001. The overall class average and proportion of honors has

not changed significantly over that time period. It appears that over the past three years only one student has

failed.

Methods of evaluation and Observations from the CRICES report:

1. Ward evaluation

The ward grade is based on an average of the three separate ward assessments provided by the three

different 2-week subspecialty rotations. At least two assessments are used to calculate the grade. A template

was used to derive the mark based on the criteria checked by the supervisors such that the supervisors did

not assign a specific grade in 2000-2001.

The class average has been in the honors range with a high percentage of students achieving honors in 1998-

9 and 1999-2000. This may have been due to supervisors assigning specific grades in those years. In

contrast an algorithmic approach was used in 2000/2001 perhaps accounting for the lower class average and

lower percentage achieving honors in 2000-2001.

67

An analysis of internal consistency of the ward assessments over three rotations in 2000-2001 suggested

high correlations but it was not clear as to what was actually being measured in this analysis. The measure

of internal consistency instead could include a correlation of the individual student scores in each of the 2-

week blocks. Another analysis should look at consistency of faculty grading of students and in relation to

other faculty including within the same subspecialty. There were no statistical differences in student grades

among academies for 1999-2000 and 2000-2001. Hospital specific data were not provided nor were there

analyses by subspecialty.

With respect to the validity of this evaluation: content validity is supported by the fact that the evaluation is

based on the course objectives. Correlations of the ward evaluation to the written examination and OSCE

are low indicating either that there is in fact a low correlation between the assessments or that the

assessments are measuring different things. One analysis of construct validity that could be done would be

to assess performance at different times during the year, such that students doing the rotation early in the

year might perform less well than those doing the rotation in subsequent time periods. Predictive validity is

not included but an analysis of performance related to Phase 1 results and on the Medical Council

examinations could be attempted.

Feedback is given at the end of each block but it is not clear how mid-block feedback is provided and

documented given the short rotations of only 2 weeks.


The written examination is given as an MCQ examination of 60 questions. A multidisciplinary surgical

committee created the question bank. 15% of the questions are new each year. Item statistics are used to

decide upon modification and deletion of questions.

There has been an increase in overall class average during 1999/2000 and 2000/2001, with a concomitant

increase in the proportion with honors. Reliability of the examination is variable with alpha coefficients

ranging from .11 to .60 in the academic year 2000-2001. There was no statistical difference among

academies in the last two academic years. Content validity is supported by the fact that the evaluation is

based on the course objectives in the seminar syllabus and although the exam emphasizes Phase ll

curriculum, Phase 1 content is incorporated as well. As with the ward assessment concurrent validity is low

between the written examination and the ward evaluation or OSCE. This again may be a function of the fact

that each evaluation is assessing different parameters. A measure of construct validity was not provided but

again could be an analysis of performance over the academic year. A second approach would be to examine

the scores of students who did specific subspecialty rotations and the performance on subspecialty-specific

examination questions, though this may be limited by the small number of such specific questions included

in the examination. Predictive validity was not provided but correlations of performance in Phase 1 surgery

to Phase ll surgery could attempt to assess this as could performance in the Medical Council examinations.

Students do not receive specific feedback from the written examination other than the score as this is a

secure examination and students do not review the answers to the specific questions.

3. OSCE

The OSCE has been a combined Medicine/Surgery assessment at the end of the 12 weeks of the Medicine

and Surgery rotations. The Surgery component includes eight stations, four 10-minute and four 5/5-minute

stations. The stations are drawn from a bank that is currently being updated. Consultation with the

68

Medicine clerkship OSCE coordinator occurs so that station content does not overlap. Examiner feedback

on the stations is used for future station development.

OSCE grades have been consistent over the three academic years reported upon with a low percentage of

students obtaining an honors score. Reliability of the examination is acceptable with an alpha coefficient of

.51 for the February 14, 2001 examination. There was no statistical difference in examination marks across

academies in the last two academic years. Content validity is again supported by the fact that the

examination is based on course objectives and incorporates both Phase 1 and Phase ll content. Correlations

between the OSCE and the other evaluations are again low suggesting that the evaluations may be assessing

different aspects of student learning. Analysis of construct validity was not provided but could include

performance based on time of rotation during the academic year. A second approach would be to examine

whether performance was better when the OSCE was done just following the Surgery rotation versus the

scores of students who do the OSCE six weeks later after the rotation because of the Medicine clerkship

intervening. Predictive validity was not provided but could include comparison of performance in the Phase

2 Surgery OSCE with either their scores in Phase 1 Surgery or the Medical Council examination.

Feedback on the OSCE is provided via a form that lists the station content, performance on a scale from 1 to

5 on history taking, physical examination, organization, knowledge and communication for each station and

on content (checklist score) as being below the passing standard or at or above the passing standard for each

station.

Areas of Strength:

1. The evaluation of student performance in the Phase ll Surgery Clerkship is based on several different

assessments measuring factual knowledge (written MCQ examination), clinical performance (ward

assessment) and clinical skills and knowledge (OSCE).

2. The OSCE examination has provided consistent grades over the past three academic years and a formal

feedback process has been instituted more recently.

3. The Ward assessment has suffered from grade inflation over time but with the recent institution of an

algorithm to calculate the score this has become a lesser concern.

4. The course director has indicated that the test banks both of the MCQ examination and the OSCE are

undergoing revision and renewal.

Areas for Improvement:

1. Although the ward assessment utilizes an algorithm to calculate the score this may be based on only 2

out of 3 assessments. The number of students for whom the final ward assessment is based on only two-

thirds of the evaluations was not provided. This would be of interest. This practice should clearly be

minimized. A correlation of the scores for individual students obtained on each of the rotations would also

be of interest. The current form does not provide descriptors for each of the cells: these should be

considered in future revisions.

2. Internal consistency of the written (MCQ) examination appears to be less than ideal. It has been

suggested that the examination is less secure than expected. A departmental examination committee to

review, revise and renew the current bank of questions could help to improve the reliability of the

examination and monitor the statistical properties of the examination.

69

3. Currently there is no formal feedback to the students on their performance on the written (MCQ)

examination other than the final score.

4. An analysis of construct and predictive validity of each of the component assessments may be helpful to

ensure that the assessments are valid and reflect student performance.

Recommendations:

1. Ward assessment

a) The ward assessment should ensure that all three evaluations (each 2-week block) are included in the

calculation of the final score.

b) The ward assessment form should be reviewed to include descriptors in each of the categories and these

should sufficiently distinct to discriminate across the scale.

c) An analysis should be carried out to determine the correlations across each of the evaluations to

determine consistency, including consistency across the sub-specialties and preceptors.


a) A departmental examination committee be established to generate new examination questions, review and

blueprint each examination and monitor statistical properties of each examination and over the academic

year. The department should consider paying question authors a stipend if current difficulties in obtaining

sufficient new questions persist.

b) A process be developed to provide students with feedback on their performance on the examination such

as sub-specialty performance and provided as H/P/F and in relationship to other students.

3. OSCE

a) A departmental examination committee should be established to generate new stations, review and

blueprint each examination and monitor the statistical properties of each examination and during the

academic year.

4. Analysis of validity

a) An analysis should be performed to provide evidence of validity particularly construct and predictive

validity of each of the components of the assessment. Correlations with Phase 1 grades and with

performance on the Medical Council examination can be calculated to complete this component of the

CRICES report.



conclusion that:

Multiple revisions as per committee suggestions . review in 1-2 years


Ted Ross, Course Director

Documents

Examination and student assessment committee