Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
EXAMINATION AND STUDENT ASSESSMENT COMMITTEE
REPORT ON AMBULATORY COMMUNITY EXPERIENCE STUDENT ASSESSMENT FROM
2002-2005
M. Schreiber, M.D.
INTRODUCTION
The Ambulatory Community Experience (ACE) course takes place in year 4. It spans either 4 weeks (pre-
CARMS) or 3 weeks (post-CARMS). It involves students being placed in a variety of ambulatory settings,
either community or hospital-based, one placement per student. Evaluation consists of two modalities:
1. A performance-based evaluation. This consists of a grid, with the five rows consisting of
competencies related to the course objectives, and the columns indicating the level of ability
achieved by the student. This counts for 60% of the course grade.
2. A case write-up. This includes a reflective component. This counts for 40% of the course grade.
Two observations are noteworthy in the appraisal of these evaluation instruments:
a) The grades in this course do not appear on the student transcript that is submitted for the CARMS
process.
b) The performance evaluation form in use beginning in 2005/2006 is significantly different from that
in use from 2002-2005. The new form is similar to the form used by all courses in the clerkship, and
includes a considerably expanded matrix with 14 competencies assessed, five possible levels of
ability for each, and detailed descriptors provided for each level of each competency.
Notwithstanding, the straightforward correspondence between the stated course objectives and the
evaluation form has been lost, and so the ACE course director has requested advice from ESAC with
respect to how to match the competencies assessed by the new evaluation form to the course
objectives. The ESAC report will of necessity be based on the data submitted for academic years
2002-2005.
AN APPRAISAL OF THE ASSESSMENT MODALITIES
1. Performance evaluation form
This is completed by the supervisor at the site where the student completes the rotation. As noted above,
five general competencies are assessed:
i. Clinical problem-solving skills.
ii. P:atient management skills
iii. Health promotion and disease prevention
iv. Professional behaviours
v. Community impact on patient care
These are rated on a 5-point scale from “unsatisfactory” up to “above expectations”. Descriptors are
provided for the highest, middle and lower levels. This is worth 60% of the final grade.
APPENDIX 38 APPENDIX 40
Feasibility
The form is certainly straightforward and should present no difficulty for the supervising clinician to
complete.
Validity
Content validity is clear inasmuch as the form is clearly and explicitly linked to the course objectives. In
this reviewer‟s judgment, the form has excellent face validity.
There is no evidence available on predictive validity. To obtain this would require, as stated by the course
director, a considerable research project to study the clinical competence of graduates and then correlate
scores in residency to score on ACE. This is not likely to happen.
Concurrent validity is addressed by noting a very weak, albeit positive, correlation, with the case write up
assignment. Further data on concurrent validity could be fairly easily obtained by studying the degree to
which scores on the ACE form correlate with scores on the performance evaluation in other fourth year
clerkship courses (medicine, surgery, emergency medicine and anesthesia). I would recommend this be
pursued.
Reliability
It is important to know if raters at various sites are using the form in a similar manner. There is however
little evidence provided about this. The only data suggesting that this may be the case is noted on page 4 of
the report, where it is stated that ratings at community sites are similar to ratings in hospital-based sites. It
would be reassuring to know that average grades are roughly similar at the various sites over a sufficiently
long period of time so that several students would have been evaluated at each site. This should be carried
out to identify “outliers”, i.e., supervisors who mark either very leniently or very harshly.
It might be helpful to provide some explicit guidance to raters as to the likely expected performance level of
most fourth year students: e.g., “Most students should be in the meets expectations category”.
Inter-rater reliability cannot be assessed since only a single rater at each site evaluates the student. In some
cases, students interact with more than one clinician, and it would be feasible in those cases to have each of
the clinicians complete the form, and then have the supervisor “average” the ratings on the different forms
in generating the final evaluation. In situations where there is only one supervisor, this would of course not
be feasible.
There are positive correlations noted between the scores of each of the five competencies and the overall
scores. This is of course not surprising since there are only five contributing elements to the final score, and
one would expect scores on this small number of elements to correlate with the final overall score.
Grades achieved
The document does not indicate how the checkmarks on the performance evaluation form are converted into
grades.
Over the three years sampled, the average grades have been very stable, with a mean grade of close to 84%,
and a standard deviation close to 6%. This generates a proportion of honours ranging from 75 to 80%. This
is in the same range as is typically seen for performance evaluation in other clerkship courses, and likely is
attributable to a leniency bias of raters. Assuming that an honours grade reflects a predominance of “above
expectations” ratings, then either the students are exceptionally good or expectations are somewhat low.
Since these grades are not part of the CARMS form, this grade inflation is likely not of major consequence.
Feedback
The procedures in place to provide feedback to the student about her/his performance seem appropriate,
since there is a structure in place for both ongoing regular feedback as well as more formal feedback
midway through the rotation and at its conclusion.
2. Case write-up
A single case write-up is submitted at the end of the rotation. This is worth 40% of the grade. The write-up
is up to 8 pages long, double-spaced, with very clear expectations as to structure, outlined in appendix 8.
Feasibility
The write-up is graded by members of the ACE course committee. It seems to be a reasonable task for these
individuals to be completing.
Validity
The face validity of this exercise in my judgement is very high.
The content validity is supported by the close connection between the course objectives and the components
of the case write-up assessed on the evaluation form. Each of the five competencies is explicitly evaluated.
Examples of issues to be addressed are provided to the students as outlined in appendix 8 and presumably
the same examples and guidelines are available to the graders.
Concurrent validity is indicated by the weak albeit positive correlations with the performance evaluation
scores in the ACE rotation. It would be feasible to search for correlations between the scores on this
assignment and scores on similar exercises, including the DOCH-4 assignment, the reflective write-ups used
in the year 4 medicine rotation, and perhaps case write-up exercises in other clerkships.
Predictive validity is likely not feasible for the same reasons cited above in the appraisal of the performance
evaluation form.
Reliability
Each case write-up is marked by one marker. Accordingly, inter-rater reliability cannot be determined. It is
appropriate that a small number of markers is used, since they can be trained to grade the write-ups in a
consistent manner. It would be useful, however, to verify that on average each of the markers assigns
comparable grades. This should be feasible. Also, it would be reassuring from time to time to check that
markers are working consistently by having a small number of papers marked by each marker. This would
identify hawks and doves. More explicit guidelines on what markers are to look for in each of the domains
would be appropriate. Specifically, the form should indicate what constitutes “insight” into the case, and
what constitutes a “thoughtful” analysis.
Data on internal consistency is not provided. An internal consistency score could be calculated easily
enough if the data have been captured as to exactly what score each student achieved on each competency.
Presumably, one would expect students who score well on one aspect of the write-up to do well on other
aspects.
Actual grades achieved
Interestingly, scores on the case write-ups are modestly but definitely and consistently lower than on the
performance evaluation. The mean scores ranged between 79 and 80%. Equally interestingly, the spread of
scores is much wider, with the standard deviation between 11 and 12%. This means that, assuming a
normal distribution, around 16% of students would score below 70%, which is quite a significant number of
students scoring at a fairly low level.
3. Grades as a whole
In the course as a whole, mean grades were close to 82% each year, with standard deviation of 6% and a
proportion of honours ranging from 64 to 70%.
This is not too different from several other clerkship courses. The same comments apply here as were made
in relation to the performance evaluation form: this seems to be a quite high proportion of students to be
designated at the honours level.
RECOMMENDATIONS
1. In order to study the concurrent validity of the performance evaluation form and of the case
write-up, it would be appropriate to correlate scores on these with scores from comparable
assessments in other clerkship rotations.
2. In order to reassure ESAC that raters are using the performance form in a reasonably
consistent manner across sites, it would be appropriate to provide data on how scores have
averaged across these sites over the years.
3. At sites where multiple clinicians are interacting with the student, then each clinician should
complete the evaluation form and the supervisor then can average the ratings.
4. There should be more direction given to raters on the expected proportion of students
achieving at each level on the performance evaluation form.
5. A sample of case write-ups should be marked by all raters to ensure they are each marking at
a reasonably similar level of expectations.
6. Consideration should be given to a second written report to be handed in during the first half
of the rotation, or alternatively an oral presentation. If the course director finds this proposal
helpful, then resources should be made available to support the marking of a second written
report.
7. Provide mid-rotation feedback to students on their performance in the rotation up to that
point, as is done in other clerkships, so that there is time to demonstrate improvement.
8. In response to the course director‟s question about linking the new evaluation form to the
course objectives, I would suggest the following:
All of the ACE objectives relate to the scholar (self-directed learning) role. The other objectives might map
to the new competencies as follows:
Clinical problem-solving skills
This relates most closely to the first four items in the medical expert/skilled clinician domain (history-
taking, physical examination, diagnostic test interpretation, and the problem formulation). The
communicator role is also relevant.
Patient management skills
This relates most closely to competencies in the medical expert/skilled clinician cluster (problem
formulation and management plan; use of evidence-based medicine), and to the three competencies in the
communicator/doctor-patient relationship cluster (communication with patients/families/community; written
records; patient education).
Health promotion and disease prevention
This relates most closely to the health advocate cluster (recognition of important determinants of health and
principles of disease prevention; patient advocacy).
Professional behaviours
This is relevant to all the competencies, and is also captured on the professionalism form. The collaborator
role is particularly relevant here.
Community impact on patient care
This is most relevant to the manager role (awareness of and appropriate use of healthcare resources) and to a
degree the collaborator role (team participation, provision of patient care in collaboration with all health
care providers).
Respectfully submitted,
Martin Schreiber, M.D.
ESAC Committee Course Review
Anesthesiology (ANS400Y) December 2, 2008
Course Director: Isabella Devito Lead Reviewer: Richard Pittini Student Reviewer: Nicolae Petrescu Course Summary: Anesthesiology is a two week clinical rotation in the fourth year of the medical curriculum. The rotation consists of 8-9 days of clinical placement with one to one faculty supervision. There is one day of simulation based teaching per rotation which typically occurs during the first week of the rotation. Students are assigned to between 4-6 faculty for their clinical experience and are supervised primarily by a respiratory therapist and anaesthesiology residents during their simulation day. The course objectives are reviewed by the course director and the evaluation methods map closely to the objectives. Student performance is evaluated using two separate evaluations; a written examination worth 60% and a clinical evaluation worth (40%). Students receive formal feedback at the midpoint and informal feedback following individual clinical encounters. Overall students perform well in the course with a class average of between 77 and 80 % over the last three academic years. The proportion of students who fail the course is 0 to 0.5% while the proportion who receive honours (>80%) is between 29 and 53 %. Only a total of 20 students received borderline grades (60-69%) over the last three years. Evaluation Components: Written Examination: The written examination is a ten question short answer examination that consists of 40% new questions per iteration. The questions are created or selected from a secure pool and reviewed by the course director and one site coordinator prior to inclusion. The number of subsections per questions varies but the overall quantity of information required to answer each question is uniform. The questions are selected to cover all content areas of the curriculum. Questions are also reviewed post hoc and are revised if <60% or >90% of students answer them correctly. The examination is administered centrally and is computer based. Students are allowed to move back and forth between questions. Two examiners are responsible for grading each written examination. The examinations are divided for marking such that one examiner marks half of the students for question 1 and the other marks the other half of the students for the same question. They then alternate questions such that students have the benefit of two markers for the entire examination. None of the questions are graded by more than one faculty, and no single question is marked entirely by one faculty. The benefit of this design according to the course director is to facilitate discussion between the two faculty as to what answers are acceptable. If additional answers are accepted, preceding papers are remarked according to the revised marking scheme.
Students perform relatively well on the written examination with average scores of 76-81% over the last three years. The marks are normally distributed with a reasonable proportion of students scoring in the borderline category. The marks are consistent across academy sites suggesting they have construct validity however there is a greater variation between rotations suggesting that it is possible that not all examinations are equally challenging. The variation in scores between blocks is less than 2 standard deviations and is not likely of significance. Clinical Evaluation: During each day of the rotation students are assigned to a faculty person. The students provide the faculty with a clinical encounter card to complete which is submitted in a drop-box. This card evaluates 11 criteria. The rating scale used is a five point Likert scale with behaviour anchored ratings. The scale is weighted as follows: (Unsat.=0%, Below exp=65%, Meets=75%, exceeds=80%, outstanding=90%). The weighting of each criterion is determined by the course director and reflects the curricular content. No criterion is worth more than 15%. Individual encounter cards are reviewed by site coordinators who then transform these evaluations into a mark for clinical evaluation. Site coordinators are given the latitude to decide whether to include marks from „hawks‟ or „doves‟ if they are out of keeping with the remainder of the evaluations. The class average on the clinical evaluation has been stable for the last three years at 77-78% with very little variation between academies or between blocks. The standard deviation was as low as 1.7 – 1.9 for one academy over the last three years. No students received an unsatisfactory rating on any criteria at any site for the last academic year. There is a clear pattern of marks with one site having the highest mark for 10 of 11 criteria and another site having the lowest for 8 or 11 criteria. The number of students assigned to each of these two sites is small. Feedback Students have direct observation by faculty throughout their rotation and are engaged in discussion on a regular basis. They received feedback informally on a daily basis. There is a formal written feedback at the midpoint. A new form has been introduced this year to facilitate this feedback and includes not only areas for improvement but also an action plan. There is no formal feedback at the end of the rotation although students are able to „disagree‟ with the clinical evaluation and may subsequently discuss the evaluation with either the site coordinator or course director. Students provide feedback to the course director regarding course evaluations and in the past some students have indicated a preference for fewer faculty observers. Observations: Written
1. The proportion of new questions is high (40%) and may lead to larger fluctuations in the written examination scores
2. The distribution of marks is broad suggesting that the current evaluation methodology captures all levels of competence adequately including those in the borderline category
Clinical
1. Direct observation by multiple faculty facilitates accurate evaluations 2. The issue of having „too many‟ faculty supervisor may be more pertinent to those students seeking a
letter of recommendation 3. The process for integrating clinical evaluations is not consistent across the 4-6 individuals
contributing to a student‟s mark 4. The standard deviation for clinical evaluations appears to be low and this may be the result of
under-utilization of the lower end of the rating scale
5. The weighting of the unsatisfactory rating is too low (no effect apparent due to the infrequency of
its use) 6. 10-15% of the rotation consists of a Simulation component but this component is not evaluated
Feedback
7. The feedback at the end of the course is limited to the provision of component marks and is of limited value to students
Recommendations:
1. Introduce no more than 15% new questions per examination, ensure your question databank is sufficiently large to sustain this approach
2. Divide questions such that one marker grades all students on a given question. The markers can consult each other regarding whether the answer key should be adjusted but this does not necessitate the current method of dividing the examination questions
3. Develop a consistent approach for how faculty input is integrated by site coordinators and disseminate this approach via faculty development
a. Consider adjusting marks rather than omitting them if the patterns are consistent, previous years data could be used to accomplish this objectively
b. Ensure that all evaluators use the same approach to compensating for inexperience i.e. all more lenient at the beginning or all call it like it is but the site coordinator takes into account the date on the encounter card.
4. Revise the clinical encounter cards to include a not applicable column and a checkbox for “this evaluation was discussed with the student”
5. Adjust the clinical evaluation scale such that unsatisfactory is weighted 55% and encourage faculty to utilize the full range of the scale as appropriate
6. Consider developing a method for evaluation of the skills demonstrated by students during the Simulation day, recommend a weighting to such a component of no more than 15%
7. Provide students with a breakdown of the areas that they did not perform well on during the written examination
Conclusion of Review
Based on the CRICES report presented and the opinion of the ESAC committee members, it is our conclusion that: No major issues, ongoing improvements encouraged ………………….. Full review in three years time ___________________________________ Richard Pittini, ESAC Chair ___________________________________ ___________________________ Isabella Devito, Course Director Date
Examination and Student Assessment Committee
Arts and Science of Clinical Medicine II
The course director, Dr. Jacqueline James, presented a comprehensive review of the Arts and Science of
Clinical Medicine II (ASCM II) on November 4, 2008.
Lead Reviewer: Dr. Dara Maker
Components of the Student Assessment System:
There are four components that together account for the student‟s final mark. They include a midyear
observed history and physical exam, oral presentations, written assignments and a final OSCE. Students are
also required to complete a Log Book and are evaluated on their professionalism; however, these are not
included in the final mark calculation.
1. Final OSCE - 50%
The final exam consists of a ten station summative OSCE that covers all major clinical subspecialty
areas covered in ASCM II. The exam is created by the Curriculum Subcommittee of the ASCM II
committee, and the bank of questions (approximately 32) has undergone substantial revision in the past
7 years, including updated scripts and checklists. In the previous academic year, the class average was
78.43% with 40.3% of the students receiving honours.
2. Midyear Observed History and Physical Examination – 20%
The midyear evaluation is both a summative and formative assessment. Students perform a focused
history and 3 physical exam maneuvers and are rated using Likert scales to evaluate process and content.
Two global ratings are also used. In the past academic year the class average was 82.59% with 76.7% of
the students receiving honours.
3. Oral Presentations – 15%
Two oral presentations are given, each worth 7.5%. One is evaluated by the core tutor and the other by
the pediatrics tutor. Students are graded on five components of the presentation based on global ratings
out of 5.
4. Written Assignments – 15%
Two case reports are completed, each worth 7.5%. One full case report is marked by the geriatrics tutor
and is calculated based on five criteria. The psychiatry tutor marks a mental status examination write-up
that is based on three criteria.
The class mean for the composite scores for all in-course assignments (two oral presentations and two
written assignments) in 2007-2008 was 83.11% with 88.8% of the students receiving honours
The overall class average in 2007-2008 was 80.66%, with 73.1% of the students receiving an honours in
the course. The class average has been stable over the past three academic years and is consistent with
class averages from other courses. The 2007-8 range of marks was narrow (70.2-86.0%) and the
standard deviation was small (2.42). This has also remained stable over the past three years.
Areas of Strength:
The ASCM II course demonstrates a number of strengths of the student assessment system. A significant
asset of ASCM II is that it uses a number of different evaluation methods to allow students to be examined
across a variety of domains.
The course is noted for its constant development of its evaluation system via ongoing revisions and
responsiveness to feedback. The opinions of both students and consulting feedback faculty considered
when making changes to the course. The addition of the non-evaluated observed history and physical in
2008-9 to provide students with specific feedback in preparation for their mid-year exam is an excellent
example of the continuous improvements being made to the course.
ASCM II is also noted for its commitment to feedback. Students receive considerable formative feedback
throughout the course via their in-course assignments and exams. Additional methods for ensuring
feedback include the mandatory completion of the skills log-book. Although not calculated for marks, it
ensures regular observation of physical exam maneuvers and continual feedback.
1. Final OSCE:
The ASCM II OSCE includes a large bank of questions (32) that is continuously updated and revised
by numerous methods. At the time of the OSCE, feedback is solicited from examiners and standardized
patients, which is then used to improve upon the station in the future. Past student performance and
comments from experts in the field are also reviewed prior to re-using stations. Lastly, the course
director reviews all checklists to ensure the relevance of the physical exam maneuvers tested.
The exam reflects all major specialty areas taught. Committee members who are experts in the field
generally create new OSCE stations. The means and standard deviations for each station are analyzed
and compared to determine the validity and reliability of the station. Specific criteria for removal of
poorly done items are used.
ESAC was impressed with the reliability coefficients for the total score on the ten station ASCM II
OSCE in 2007-08 which were 0.64 (testing day 1) and 0.65 (testing day 2). The reliability coefficients
were slightly higher in 2007-8 than in previous years, however all the scores within the past three years
have been above 0.5. In 2007-08 there was no difference in the reliability between testing day one or
two. The reliability coefficients for the checklist scores and global rating scores were similarly high and
the correlation coefficients between those two scores was 0.76 and 0.73 for testing days one and two
respectively, indicating that they are both marking the same domains. There was also no difference in
mean scores across the three academies.
2. Midyear Observed History and Physical Examination
The midyear observed history and physical exam is performed on real patients, therefore test questions
are always “novel”. Detailed descriptors have been developed to help examiners evaluate students more
objectively. The observed history and physical exam provides students with considerable feedback on
their clinical skills at the mid-year point, allowing them ample opportunity to improve upon their skills
prior to the final OSCE. Students receive immediate verbal feedback on their performance as well as a
copy of their evaluation form including written strengths and weaknesses. Students appreciate the one
on one time they receive with their core tutor who is responsible for administering and evaluating the
exam. Students scoring 73% or below (global rating 3/5) are invited to meet with the course director to
review their performance.
3. Oral and Written Presentations
Both oral presentations and written reports are important communication skills in medicine and
consequently are prudent to evaluate. Students are assessed by four different tutors, giving students the
benefit of multiple observer assessments and minimizing assessment bias. The advantage of two of
each type of assignment is that students have the opportunity to incorporate the feedback they received
from their first written or oral assignment to enhance their performance prior to the second.
Areas for Improvement and Recommendations:
1. Final OSCE
Concerns were raised regarding the criteria set for failing the OSCE examination. Students must pass 6
of 10 stations and achieve a total score of 60% or greater. ESAC was concerned with the low standards
required for passing, given that ASCM II builds much of the foundation for clerkship. The numeric
scale used to equate a mark from the 5-point Likert scale creates a very narrow range of marks. The
lowest possible score is 55% and maximum 91%. Given that Likert ratings of 1/5 or 2/5 are equated to
55% and 64% respectively, it is difficult for students to fail a station (achieve <60% overall).
Additionally, students can be designated as failing a station via the examiner‟s impression, but may still
achieve >60% and thereby pass the station. Overall, ESAC questioned whether stricter criteria for
passing this important exam should be developed.
Recommendations: Change the numerical weighting of the Likert scale to ensure students are
appropriately identified if performing below standard. Specifically, widen the lower end of the scale.
Adjust the requirements needed to pass the OSCE to ensure that weaker students are identified and
provided with the opportunity to complete extra work or remediation prior to commencing clerkship.
We recommend that‟s students must not have greater than two failed stations in order to pass the OSCE.
2. Midyear Observed History and Physical Examination
There is a 7-week variability in the timing of the midyear observed history and physical exam, which
may affect student performance. There are also situations when the exam is performed on standardized
patients rather than real patients and this may affect validity.
ESAC‟s major concern with the midyear exam was that it is conducted by the student‟s own core tutor.
This may introduce bias as it is being graded by core tutors whom the students have interacted with
extensively before the evaluation. Examination by the students‟ core tutors may also lead to mark
inflation (mean test score 82.49, 76.7% achieving honours). In addition, there is little correlation
between the midyear exam and the final OSCE (r=0.05 in 2007-08). Although the OSCE and observed
history and physical assess different qualities it is possible that the feedback received from the midyear
test may not be useful in preparation for the OSCE. ESAC was also concerned that students are
receiving most of their formative feedback from their core tutor (one individual) and may be observed
for the first time by others during their OSCE exam.
Recommendations: Switch examiners for the midyear observed history and physical examination such
that the examiner is not the student‟s core tutor. Although ESAC recognizes that this may influence the
students‟ perceived quality of the feedback they receive it will likely improve the examination‟s
objectivity.
Further standardization of the observed history and physical exam may be require such that all students
interact with real patients and all exams are conducted within a narrow window of time.
ESAC recognizes that given the complexity of the ASCM II scheduling considerable work may be needed
to implement these changes.
3. Transparency of the Evaluation System to Students
ESAC was concerned with the lack of transparency in the assessment system. It is not clear to ESAC
how borderline students are dealt with nor is it clear to the students – i.e. via explicit description in the
course handbook.
Recommendations: Improve transparency to students in how marks are determined and how students
performing below standard will be dealt with. Clearly describe the management of borderline and
failing students in the written material (i.e. course manual) provided to them.
4. Timely Feedback
There is variable timeliness in the provision of feedback. Students receive regular formative feedback
throughout the course but as grades are calculated centrally via MedSIS, the availability of final marks
for assignments/exams is unfortunately slow. This translates to a sense by some students that they don‟t
know how they are performing in ASCM until late in the year.
ESAC was also concerned that the OSCE summary feedback form was not being sent to students until
many months after the completion of the course. The delayed receipt of this feedback decreases its
impact and effectiveness. It was also noted that students who took the exam in 2006-07 did not receive
any written feedback. This was confirmed by the course director who explained that extenuating
circumstances led to the oversight.
Recommendations: Timely, effective feedback be provided to the students. ASCM II should work with
MedSIS and faculty to develop a system for more expedient mark calculation. Written feedback from the
ASCM II OSCE should be made available to students in a timely manner (preferably within one month
of the examination).
Conclusion of Review
Based on the CRICES report presented and the opinion of the ESAC committee members, it is our
conclusion that:
Continued improvement is encouraged . . . . . . Full review next cycle (approx. 3 years)
___________________________
Richard Pittini, ESAC Chair
______3/23/2009_____________
Date
cc: Preclerkship Coordinator, Vice-Dean, UME
Faculty of Medicine University of Toronto
Undergraduate Medical Education
Examination and Student Assessment Committee (ESAC) Course Review: Brain and Behaviour, BRB 111S
Course Co-Directors: P. Stewart and M. Hohol
Reviewers: R. Pittini and H. Bielawska
This CRICES report was presented to the ESAC committee by Drs. Stewart and Hohol on December 6th
,
2005.
This first year course consists of eight weeks of lectures, PBL sessions, seminars and laboratory sessions
organized into four blocks. Topics covered in these blocks include neuroanatomy, cell biology, motor
systems, sensory systems, higher cognitive functions and behaviour.
Evaluation is this course consists of two examinations; a mid-term exam worth 40% and a final examination
worth 60%. The midterm examination consists of a practical 55 question „bell-ringer‟ component and 50
multiple choice questions, both components are equally weighted. The final examination consists of 60
multiple choice questions and 10-13 short answer questions. The majority of the short answer questions
address problem based learning process rather than content. Different content is covered by each of the
examinations and while early concepts are built upon in the later portion of the course the material covered
on the mid-term examination is not re-examined on the final examination.
Students are evaluated by tutors and receive feedback both at the midterm and end of the course. This
evaluation is not weighted in the course mark. Answers to the multiple choice exam, bell ringer and short
answer questions are provided on the day of the examinations as a source of additional feedback to students.
Students tend to score highly on the course evaluations with the class average being 77, 83, and 82% over
the last three academic years. Very few failures occur (2, 0.5, 0%) and a significant proportion of students
achieve honours (36, 69, 62%). While the evaluations are not cumulative there is good correlation between
the midterm and final evaluation. There is also a strong correlation with other first year courses. Student
grades in BRB correlate well with grades in subsequent years but less so in the third.
Multiple Choice Questions
There are over one hundred multiple choice questions used each year with most being new contributions.
Examination questions are contributed by lecturers with the majority having been lecturers in this course for
several years. Efforts are made to ensure that new questions are linked to the course objectives. Questions
are not secure as they are provided along with the answers to students following the examination as a means
of immediate feedback. Post-hoc analysis is carried out by the course co-directors but questions are seldom
excluded. Students infrequently request changes. The midterm and final multiple choice question scores
correlate moderately although they tend to be higher on the final examination. The internal consistency of
the examinations is good. The multiple choice question scores do not correlate well with the short answer
questions (.28) but this may be due to the emphasis on process over content in the majority of short answer
questions.
Bell Ringer Examination
This 55 station examination evaluates neuroanatomy and associated functions. The examination involves
students viewing a specimen or image and identifying structures or answering brief questions about
function. The examination is timed with 1.5 minutes per station. Tutors design the questions and review
them as a group to ensure that the content is appropriate. The marking scheme is determined by an
examination committee. The written answers are marked by tutors with all answers on a given station being
marked by one tutor. Six tutors are responsible for evaluating the entire examination. Students perform
well on this component with a class average of 77, 83, and 76%. More students fail this component of the
course than then the multiple choice component with 5.5, 0.5, and 4% failing in the last three years. Marks
on this component correlate moderately with the multiple choice question scores (0.6-0.7).
Short Answer Questions
There are ten to twelve short answer questions on the final examination. Approximately 75% of these
questions aim to evaluate problem solving skills/process with the other 25% addressing content. The
number of questions is limited by the time available for students to write and by the logistics of marking the
questions. Students feel that there should be increased emphasis of short answer questions. The issue of
subjectivity in marking SA questions is addressed by having one marker grade all answers or if more than
one marker is required by dividing the questions rather than students between the markers. The average
mark on the Short Answer component ranged from 70-83% over the last three years, with between 14%-
71% of students obtaining honours. These wide ranges are felt to result from one specific year‟s exam.
This illustrates the potential impact of exam question selection. Correlation coefficients for SA and MCQ
are low to moderate (0.28-0.59) but this is likely due to the inclusion of both process and content type
questions on the SA exam. Based on limited data, there appears to be only moderate correlation between
SA questions and total grades in Year II and III (0.44-0.34).
Feedback and Remediation
Students are provided with the exam questions and the answers shortly after they complete the exam. This
is possible as the examinations are not secure and serves as the principle means of feedback to students.
Informal feedback also occurs in the context of the PBL sessions but is not formally recorded. Students
seldom contest their marks. Borderline and failing students are identified if they fall below two standard
deviations. The students are interviewed by one of the course directors and individualized remediation
plans are implement utilizing course tutors if required. Student feedback is collected and has been effective
in altering the evaluation system with a reduction in the number of MCQs. Questions on the Bell Ringer are
now such that correct answers on one question are no longer required to answer subsequent questions.
Strengths:
1. New exam questions are created for each iteration and are linked to learning objectives
2. Students receive the examination answers as a timely source of feedback
3. Students are evaluated by a variety of different methods and these methods are appropriately
matched to the type of material being examined
4. „Process‟ questions are included in addition to „content‟ questions, reinforcing the pedagogic
principle of adult learning that „how‟ is as important as „what‟
5. The emphasis on „process‟ in the PBL sessions reduces student concerns over consistency in
„content‟ between tutors
Areas for Improvement & Recommendations:
1. Increase the number of Short Answer questions, specifically those addressing „process‟
Proceed with plans to introduce SA questions and a PBL case into the midterm examination.
Collect data to allow for a correlation between SA „content‟ questions and MCQs, SA „process‟
questions and third year marks (preferably ward sub-scores)
2. Student performance in the PBL sessions is not assigned a mark despite close observation by tutors;
feedback regarding these sessions is not formalized
Faculty development sessions for tutors to aid in the systematic identification of borderline
students, consider simplified non-weighted evaluation of PBL performance e.g. satisfactory or
borderline. Develop a clear protocol for how this information would be fed forward to the
course directors in a timely fashion
Actions:
1. Review and disseminate the National Board of Medical Examiners (NBME) guidebook on exam
question writing *
2. Meet with MEDSIS (Knowledge4you) to share your requirements regarding the electronic transfer
of raw scores into spreadsheets *
* items to be facilitated by ESAC
Conclusion of Review
Based on the CRICES report presented and the opinion of the ESAC committee members, it is our conclusion that:
Continued improvement be encouraged .......................Full review next cycle (approx. 3 years)
___________________________
Richard Pittini, ESAC Chair
____April 18, 2006____________
Date
Faculty of Medicine University of Toronto
Undergraduate Medical Education
Examination and Student Assessment Committee (ESAC) Dermatology Review: Vince Bertucci, and Maha Haroun, Course Directors
Review by: Dr. R. Pittini & Dr. S. Bernstein
Dermatology (DRM 400Y) is a one week course that takes place in year four of the MD curriculum as part
of the ambulatory and community medicine block. Students attend three half-day seminars and four half-
day clinics. Students spend time with different faculty during their clinics. There is an introductory central
session. The examination occurs on the last day of the week.
Components
The students take a pre-test on the first day of the rotation. This is not included in their evaluation and is
meant to serve as a baseline for their own reference. The course directors report that they typically observe
a 20% improvement between the pre-test and final exam.
Seminar evaluations are weighted 15% and consist of faculty assessment of participation in each of the three
seminars (5% each). The committee is unaware of the format of this evaluation and the criteria used.
Student performance during each of four clinics is evaluated by the faculty and is weighted 10% per clinic
for a total of 40%. The clinics consist of three hours of interaction with faculty. Standard forms „minicards‟
based on the CanMeds roles are used to complete this evaluation. The ambulatory clinic marks tend to be
high with an average of between 83-92%. 89-100% of student receive honours with no failures. The
weighting of this component has recently been increased from 20 to 40%. Previously these marks were
assigned to an oral examination which consisted of a 30 minute case presentation. Student were asked to
present their history and physical and were then asked non-standardized questions that were both case-based
and generic. This evaluation is no longer being used.
The written examination is a case-based MCQ and uses clearly displayed images with one minute per
question. Students complete the 36-item examination according to the pace of the image presentation. They
are unable to return to previous images. The examination is weighted 45% currently (previously 50%).
Each year 50% of questions are new. There has been an intentional trend towards including easier questions
in response to previous years relatively low marks (Average of 69.0-71.3). The internal consistency of the
MCQ is reasonable with a typical α-coefficient of 0.46. A relatively high proportion of students fail (7-
15%) this component of the course but they do not receive remediation as they do well on the remainder of
the course and it is felt that the examination is difficult. Only one student has failed the course overall in the
three years presented.
Feedback regarding performance is limited because of the short duration of the course. The pre-test is
designed to demonstrate the level of difficulty on the final examination that the students can expect. The
students do not receive their grades on this examination. The correct answers are not provided although the
material covered is included in the syllabus. Feedback regarding seminars and clinics is provided at the end
of each session but this is informal and may be in written or verbal format.
Areas of Strength
1. Use of multiple formats maps well to the content covered during this one week course
2. Interaction with faculty in the clinic and in the small group seminar setting provide an opportunity for
direct observation of students by evaluators
3. MCQ provide an opportunity for objective evaluation of students core knowledge
Areas for Improvement and Recommendations
1. Pre-test utility
a. Students do not optimally benefit from partaking in this exam given its current timing and the
lack of feedback they receive.
Consider moving the exam to later in the week and provide students with the their marks as
well as the correct answers, also consider providing benchmarks which may be helpful in
motivating students to read (consider providing the syllabus at the beginning of the year)
2. Ambulatory Clinics & Seminar Evaluations
a. Performance evaluations lack objective criteria
Develop specific observable criteria for faculty to use when evaluating students in seminars
and during clinics. Consider having evaluations integrated into a final seminar mark by
using a template applied to the session evaluations. If a faculty evaluates more than one
session with a student the weighting of their evaluation should be greater (e.g. 1 session has
a weighting of 1, 2 sessions a weighting of 2 etc.). Fewer faculty observing the same students
on more occasions is ideal. Conduct faculty development directed at encouraging faculty to
evaluate students according to the demonstrated skills rather than participation.
3. Multiple Choice Examination
a. Level of difficulty appears to be too high
Too many new questions included. Typical proportion for most courses is 15%. 50% of new
questions per examinations may create inconsistencies in examination difficulty. Increasing
viewing time for each question or utilize a computerized method to allow students to control
their own pace
b. Lack of remediation for students performing poorly
Given the current weighting and objectivity of the MCQ examination, students who perform
poorly on this component should receive feedback and advice for improvement regardless of
their overall standing
Conclusion of Review
Based on the CRICES report presented and the opinion of the ESAC committee members, it is our
conclusion that:
Some issues have been indentified, ongoing improvements encouraged .... . . . . . . . . . . . . . . . . . .
Please provide a letter reviewing the impact of the recent & proposed changes in one year‟s time
___________________________
Richard Pittini, ESAC Chair
___________________________ ____________________________
Vince Bertucci, Course Director Date
Maha Haroun, Course Director
Distribution: Course Director
ESAC File
Clerkship Coordinator
Vice-Dean, UME
ESAC Review Course: DOCH II Course Director: Ian Johnson Report: April 1, 2008 Lead Reviewers: R. Gupta and J. Wang Background This is a year long course in year 2. The goals of the course are to encourage lifelong learning, and to develop and employ research methods, particularly in community health settings. There are 21 hours of lectures, 19 hours of seminars, and the bulk of the course-time is left for individual learning and project work. Evaluation Components There are 8 components to the evaluation of students.
1) Librarians mark the library search strategy, and this component is weighted at 10%. Librarians receive standard setting education.
2) The individual learning plan is weighted at 20% and is assessed using a structured evaluation form with a separate page for feedback to the student.
3) Twenty percent of the mark is derived from a 50 item MCQ examination held at midterm. This examination assesses didactic material taught on research methodology.
4) Students complete a progress report on their individual learning project that is weighted at 10%. The progress report is evaluated using a structured form with anchors describing the performance required to achieve specific marks and there is space for feedback to the students.
5) Assessment of the student’s oral presentation of their project is worth 20% of their final mark.
6) The final written report on their project is assessed using clearly defined criteria, and is worth 15% of the final mark.
7) Attendance and participation at the community agency is evaluated by agency representatives and is worth 5% of the final mark.
8) Professionalism is evaluated by the agency but not graded. Creation and Monitoring of Evaluation Tools The MCQ is a secure examination and 10% of questions are new each year. Items are reviewed if they perform poorly on statistical review (eg. Too easy or too difficult) or there were questions regarding the item during the examination. The exam is set and proctored by the course director. The evaluation forms for the individual learning plan, progress report, oral presentation, and written report were developed by the course director with input from other educators and students. The forms have been improved considerably since the last ESAC review. Each tool describes the components of the exercise that are evaluated, weight of the components, and provides space for detailed feedback to the student.
Analysis of Evaluation Methods Overall, students perform very well in this course with class averages of 83-85% over the last 3 years. The vast majority of students obtain an honours grade (78-92%) and only 1 student failed the course in the last 3 years. The standard deviation of the final marks is small at about 4% over the last 3 years. The vast majority of students receive an honours grade on the independent learning project, MCQ examination, ILP progress report, project presentation, and final written report. A lower proportion of students receive honours on the library search strategy (43-62%). The independent learning project, independent learning project progress report, and the final written report are marked by the same individual. A limitation of all of the written assessments is that there is only one marker for each assignment.. However, since the vast majority of students receive an honours grade, there is consistency, vis a vis, reproducibility, in the marking scheme. The MCQ examination consistently results in a reliability score of about 0.65, which is modest for an exam with 50 items. The library search strategy mark is generated by a trained librarian but a standard marking and feedback form is not used. With respect to validity, there is a positive correlation between the component marks and the final mark, although this is expected. The correlation between the exam and presentation marks is 0.1. The correlation coefficients for the various written assignments over the past three years range from .37 to .61. These three marks pertain to the same domain and are marked by the same person, therefore a relatively strong correlation is expected. The course director hypothesizes that given the second assignment is a progress report the grades will not correlate highly as improvement is expected. He estimates that it is approximately 10% of students who score quite low on the initial assignment but they consistently improve substantially by the second assignment. The course director has also convincingly demonstrated face and content validity. The fact that there are no differences between academies, adds somewhat to the construct validity of the assessment instruments. Areas of Strength
1. Multiple testing methods to sample various competencies. Matching of assessment method to the task being assessed.
2. Assessment of the application of knowledge (ie. completing a research project vs testing factual knowledge)
3. Provision of an excellent learning experience in professionalism. 4. The course director should be commended for the changes made to the assessment
methods since the last ESAC review. Areas of Improvement
1. Variability in agency support, librarian support, and supervisor support. The committee
recognizes that there is no way to eliminate this variability.
Recommendations
1. Consider a standardized form for assessing the library search strategy and provision of feedback
2. Consider incorporating the complexity of the project and level of agency support within the marking scheme. One option may be to add a “box” on evaluation forms for the library
search strategy, final write-up and presentation, to remind markers to incorporate these variables into their grades.
3. Encourage supervisors to provide feedback in a more timely fashion, for all assignments. Consider deadlines for the supervisors with a mechanism for identifying overdue feedback. Contact details of assigned faculty advisors should be explicitly noted on the course website at the start of the course.
4. To address the issue of the high proportion of students that obtain an honours mark, the MCQ examination items should be reviewed. Consider removing from the pool, any items that are answered correctly by more than 90% of the students.
5. Increase to 10-25%, the proportion of new items on the MCQ examination, from the current 10%.
Conclusion of Review Based on the CRICES report presented and the opinion of ESAC members, it is our conclusion that: Continued improvement encouraged ............. Full review next cycle Respectfully submitted, R. Gupta
Examination and Student Assessment Committee
Emergency Medicine (EMR400Y)
The course director, Dr. Rick Penciner, presented a comprehensive review of the Emergency Medicine
rotation on April 7, 2009.
Lead Reviewer: Dr. Richard Pittini
Student Reviewer: Ms. Alyse Goldberg
Components of the Student Assessment System:
There are three components that together account for the student‟s final mark. They include a written
examination, a global clinical evaluation and a seminar participation mark
5. Final written examination - 50%
The final written exam consists of 20 MCQs, 8 SA, and 5 key feature questions. Material evaluated
is drawn from the manual provided to students. Average marks are 79%, 82%, and 78% for the last
three years.
6. Global Clinical Evaluation – 44%
Marks are generated from shift encounter cards that are completed by 3-5 supervising faculty. Marks
are consistent over the last three years with an average of 82%. The standard deviations are narrow
at 3.9 – 4.3
7. Seminar Participation – 6%
Students receive a mark of 2% for each seminar attended. There is no evaluation tool utilized.
The overall class average has been stable between 80-82% over the last three years with approximately
2/3 of students receiving honours. No students have failed and very few fall into the borderline (60-
70%) range. The various evaluation components consistently demonstrate low correlation (0.15) but
may reflect different domains being evaluated.
Areas of Strength:
The written examination has excellent face validity with clear mapping between course objectives,
curricular content and evaluations. The thorough review of questions by multiple reviewers promotes
consistently high quality questions. The use of 20% new questions per iteration with frequent updates to the
course manual keeps the questions pertinent.
Students are directly observed by more than one faculty during the course of their rotation. Clinical
Evaluations are structured and faculty receives instructions on how to complete these evaluations.
Evaluations are consistent across sites with no statistically significant differences being noted for the most
recent dataset.
Areas for Improvement and Recommendations:
1. Seminar participation is not evaluated in a structured fashion and marks assigned only reflect
attendance at mandatory sessions. The material covered in the sessions is better evaluated with the
current written examination.
Recommendations: reallocate the marks assigned to seminar participation to either the written
examination or to alternate evaluations. Attendance at x number of seminars could be a pre-requisite
for credit or lack of attendance can be addressed with the professionalism evaluation
2. Encounter cards are inconsistently used or under-utilized by faculty as rich sources of formative
feedback to students.
Recommendations: modify the encounter cards to make the sign-off by faculty m ore specific as to
whether the encounter was reviewed with the student in person. Faculty should be instructed as to how
many such „reviewed‟ encounter cards they are expected to complete (set a minimum to ensure
uniformity across sites). Faculty cannot be expected to provide comments on all criteria listed on the
encounter card on every occasion; a “N/A” column should be added to the form. As well, a
standardized method of balancing the weight of the evaluation from staff where there was only a single
encounter with those that can evaluate the progression of a student over many shifts should be
encouraged.
3. A significant goal of the rotation is to teach technical skills, both during shifts as well as during time
dedicated to learning technical skills. Direct observation occurs by both physicians and nurses and
students are given the opportunity to perform basic technical skills on patients as well as during the
dedicated „technical skills‟ half day.
Recommendations: Several options are available for evaluating technical skills during this rotation.
Modified Technical Encounter cards could be completed by physicians or nurses observing students
performing basic skills. These cards could include detailed checklists for specified procedures and a
global rating for overall performance A method to standardize exposure would be to use models to
evaluate technical skills, this could be incorporated into the „technical skills‟ session. Marks currently
assigned to Seminar Participation can be re-allocated to Technical skills.
Conclusion of Review
Based on the CRICES report presented and the opinion of the ESAC committee members, it is our
conclusion that:
Continued improvement is encouraged . . . . . . Full review next cycle (approx. 3 years)
Richard Pittini, ESAC Chair
cc: Vice-Dean, UME
______11/23/2009_____________
Date
Examination and Student Assessment Committee
Family Medicine Clerkship – Review
Clerkship Director: R. Freeman
Report: June 10, 2003
Lead Reviewer: R. Pittini
Student Reviewer: M. Warsi
Background
The Family Medicine CRICES report was presented to the ESAC committee by Dr. Freeman on June 10,
2003. This clerkship rotation is a four week rotation in the third year of the undergraduate medical
curriculum. Students receive an orientation at the beginning of their rotation and are provided with an
elaborate list of learning objectives. The same course committee members are responsible for both the
development of objectives and examinations. Students are assigned to faculty and benefit from a low ratio
of faculty to students. Students participate in patient care under the supervision of faculty.
Students are evaluated using an OSCE examination, clinical/ward evaluation, and a written assignment.
Student are required to complete a self-assessment and to meet with their supervising faculty to receive
feedback. A mark is awarded for completing this course requirement. A log of patient encounters is
required but is not assigned a grade. Professionalism is evaluated throughout the course using the
undergraduate medical education professionalism form. Professionalism is a requirement of the course but
does not contribute to the final course grade. Students are required to pass each individual component of the
course in order to pass the course.
The weighting of the various components is distributed as:
OSCE – 42.5%
Clinical evaluation – 45%
Written assignment – 12.5%
Feedback – 5%
Students perform well overall with recent class averages of 78%, 77%, 77%, and 79%. The proportion of
students receiving honours is approximately 30% for the most recent three years.
Student input suggests that the course is perceived as fair but that the OSCE component is considered
difficult. The average for the OSCE is standardized at 74% in order to control for differences in difficulty
among various exam questions. The OSCE average is therefore slightly lower than the overall class average
and is in keeping with student perceptions.
The OSCE consists of a five station examination which includes post-encounter probes at each station.
Stations are selected from a large pool of secure scenarios. Stations are developed by case writers with the
assistance of a guide book. The stations are field tested and standardized patient portrayals are videotaped
to ensure consistency. All previously used stations are reviewed and modified as required prior to re-use.
Approximately 40% of stations are new for each year.
The average score for the OSCE component is standardized to 74% in order to adjust for the variability in
exam question difficulty. This typically results in a 4-7% upward adjustment. Attempts to use other
methods for adjusting marks were less effective (e.g. mean borderline standard setting). The current
25
proportion of students failing is 1.6-2.2%, with 14-19% receiving honours. This method of adjustment
appears to be able to discriminate between students in academic difficulty and those above average.
Students who score less than 65% or receive a global rating of borderline or not competent on any two of
the five stations are deemed to be in academic difficulty. Four of the five stations must be passed to pass the
OSCE.
The psychometric properties of the OSCE are good with acceptable internal consistencies of between .66
and .72. Concurrent validity has been assessed as a research study which indicated that there was
reasonable correlation between different courses particularly when using global ratings. The current OSCE
marking scheme is weighted toward the global score. A second study has shown that the residents in family
medicine perform better than clinical clerks on the OSCE suggesting good construct validity.
The clinical evaluation of students is based on the completion of ward assessment forms. Ward assessments
are completed with contributions from faculty, residents and nurses. A template is then employed to convert
completed forms into numeric grades. The adoption of a template following academic year 1998-9 resulted
in a decrease in the proportion of students receiving honours from 70% to approximately 50% which has
been sustained. There has only been one student who failed this component in the four years of data
presented. This is disproportionate to the objective scores on the OSCE and may represent the reluctance of
faculty to use the left end of the scale. Data regarding the inter-rater reliability of these evaluations is not
presented but would be interesting, especially with regards to the effect of adopting a template for
determining grades. The ward assessment appears to have good content validity based on the mapping of
objectives to evaluations.
The academic project consists of a written component and an oral presentation. The written component
consists of an abstract and is worth approximately one third of the grade. The abstract is evaluated using a
guide with explicit descriptors. The presentation is also evaluated using a marking guide with elaborate
descriptors for each criteria. The class averages for this component tends to be higher (>80%) than others
components with a larger proportion of students obtaining honours (59-73%) and no students failing. An
significant decrease in marks was observed following a faculty development session in 1998-9 with a
subsequent upward drift. This effect may represent an under-utilization of descriptors over time and may
require ongoing faculty development.
Student feedback is provided in informal verbal and formal written formats. Format feedback using the
Clinical Encounter Feedback Exercise forms requires students to self-assess. Both of these types of
feedback are formative and are based on close observation, taking advantage of the low student to faculty
ratio. Close supervision and feedback were cited as strengths of the rotation by students. Students are
provided with a formal mid-rotation feedback session with their hospital program director. Summative
feedback is provided to students including narrative evaluations for both the OSCE and the Academic
Project. Students are provided with a ranking for individual OSCE stations. This information is presented
only for their reference. They also receive written comments from the examiners on each station.
Dr. Risa Freeman and her course committee are to be commended on an exemplary course. The specific
areas of strength and areas for improvement with recommendations are described as follows:
Areas of Strength
1. OSCE quality
26
Your utilization of the OSCE examination for the assessment of clinical skills is to be considered
exemplary. The use of an examination committee, the systematic process of station development and
revision and size of the examination pool is ideal. The psychometric properties of the examination are good
and the relative weighting in the course appears to be appropriate. The use of the adjustment factor to
account for variation in question difficulty is functioning well and should be retained. Student anxiety may
be reduced if they are made aware of this adjustment factor.
2. Feedback
This course provide a variety of formative and summative feedback. The close supervision of students by
faculty and the low ratio of faculty to students facilitate accurate, timely and apparently well received
feedback. The use of forms to guide feedback ensures quality feedback is provided. The provision of
written summative feedback is beneficial to students and likely confers an education benefit on the OSCE
examination. The use of normative referenced outcomes (e.g. OSCE rankings by 3rd
„s) is not permitted in
the grading policy but as a means of feedback provides students with supplemental information regarding
their individual performance and should be continued so long as welcomed by students.
3. Mapping of objectives to evaluations
The use of a common committee members for the development of objectives and examinations helps to
ensure linkage between course specific objectives and course examinations. The objectives for this rotation
are clearly linked to criteria for evaluation within each of the examination components. This course should
be well prepared to provide a mapping of these linkages to the program effectiveness committee.
Areas for Improvement
1. OSCE Sub-component analysis
The weighting of individual components within the OSCE needs to analyzed further to determine whether it
is optimal. While emphasis on the global score may optimize concurrent validity, increasing checklist score
weighting may reduce the need for a consistent upward adjustment
2. Evaluation of Feedback
The assignment of 5% contingent upon completing the required feedback session is not a valid form of
evaluation. In order to assign a weighting of 5% to the feedback aspect of this course it would be necessary
to evaluate the quality of feedback. As the feedback is directed toward the student and coming from the
faculty, the evaluation of its quality is somewhat problematic. There is an opportunity to evaluate the
students ability to self-assess. The 5% should not be used solely as a means of motivating students and
faculty to comply with a specific course requirement.
3. Grade Inflation for Academic Project
Despite utilization of a marking guide with specific descriptors for each criteria there appears to be a trend
toward mark inflation. It is important to ensure that the quality of the presentations and abstracts are being
evaluated in addition to the amount of effort put into them. It may be necessary to continue to provide
faculty with instruction on how they should make use of the descriptors. While you report that the primary
intent of including the Academic Project in the evaluation scheme is not to discriminate between students
this remains an important aspect. Given that the OSCE and Clinical Evaluation are assessing similar
27
domains of clinical competence it is very important that the evaluation of the academic project be retained
and in fact should be emphasized as it evaluates different and important aspects of student competence.
Recommendations
1. Analyze individual sub-components of the OSCE e.g. Checklist, Global, PEP to determine how closely
they agree with each other and what the effect would be of increasing the weighting of the checklist
(well suited for evaluation of novices).
2. Evaluate individual Ward assessment forms prior from individual faculty, residents, and nurses prior to
consensus in order to assess the inter-rater reliability of the form in this course
3. Compare inter-rater reliability prior to using a template vs. after template if data is available in order to
determine the effect of using a template to derive grades
4. Eliminate the 5% mark given for completing the required feedback forms.
5. Incorporate a faculty development session on completing the Academic Project evaluation each year of
the course.
6. Consider increasing the weighting of the Academic Project evaluation to emphasis the competencies of
the students outside of clinical skills.
7. Continue to evaluate the psychometric properties of the Academic Project evaluation and strive to
improve evaluation so as to reduce „noise‟ in the final grade while maintaining a balanced course
evaluation.
8. Continue to collect student feedback and act upon it in the exemplary fashion you have to date.
Specifically ensure that students wish to know their relative ranking on the OSCE stations.
Specific Requests
With regards to your request for support of statistical analysis:
1. The recruitment of a full-time Evaluation coordinator may facilitate further analysis within courses
but may also allow the coordination of cross-course analysis
2. Consider following University of Toronto graduates who enter the Family Medicine residency at the
University of Toronto as an opportunity to analyze predictive validity
With regards to your request for ongoing support from ESAC for the maintenance of OSCE examinations:
Your course models the ideal of objective clinical assessment and in our opinion warrants ongoing
support not just restricted to this rotation but for all rotations
Conclusion of Review
Based on the CRICES report presented, the accompanying appendices and the opinion of the ESAC
committee members, it is our conclusions that:
Continued improvement encouraged .................... Full review next cycle (approx. 3 years)
Richard Pittini, ESAC Chair
Risa Freeman, Clerkship Director
28
Examination and Student Assessment Committee Medicine Phase I (MED300Y) Clerkship Review
At the ESAC Meeting on November 6 2007, Dr. Danny Panisko, the Department of Medicine
Undergraduate Education Director, and Dr. Rajesh Gupta presented a comprehensive CRISES report on the
Medicine Phase I Clerkship system of student assessment.
A. Components of the Student Assessment System:
The three components of the assessment system are outlined below:
1. Multiple Choice Written Examination (30% weighting)
The MCQ exam consists of 75 questions and is 2.5 hours in length. The exam blueprint is based on
the course objectives, with 6-9 questions representing each of nine content/specialty domains. A
small number of questions is included from seven other areas (e.g., ECG, chest x-rays). On each
exam, 10-20% of questions are new.
2. Oral/Clinical Skills Examination (20%)
The format and marking of this exam are standardized. The clinical case will vary from student to
student. Aspects of the clinical skills exam process include:
a. The patient/clinical case is selected by the site-coordinator/delegate.
b. The student conducts a history and physical exam without observation by an examiner (90
minutes).
c. The student is assessed on presentation, diagnosis and investigative plan in a structured oral
exam format by one examiner. The evaluation form includes two components: a checklist
(44 items) and global ratings (9 items).
d. The student is also asked to perform three maneuvers randomly chosen a priori by the site
coordinator from a bank of ten maneuvers. Each maneuver has a separate evaluation form
including a checklist and global rating scale.
e. The evaluation session with the examiner can take 30 to 45 minutes.
3. Ward Evaluation (50%)
The ward evaluation form is a standard design for all faculty clerkship rotations with a common five-
point rating scale (Unsatisfactory to Outstanding), specified weights assigned to each level of
performance, and 18 performance criteria listed according to the CanMEDS Roles.
For the Phase I Medicine evaluation form, the rating on each of the 18 performance criteria is
weighted in the calculation of the ward evaluation mark. The specific weight of each criterion is
assigned by the course.
The completion of the ward evaluation was described as being a consensus process involving the site
coordinator with the residents and staff who had supervised the student.
B. Strengths of the Assessment System:
For Medicine Phase I, the strengths of the student assessment system as described in the CRISES Report
include:
1. The variety of evaluation methods that allows sampling of many CanMEDS skills;
29
2. The commitment to feedback in which each evaluation method includes a feedback system; and
3. The approach to borderline students, provision of extra work/remediation and follow-up.
C. Observations from the Assessment Data:
Course level grade statistics reported for academic years 2004-05 to 2006-07 were highly consistent in
general. At the level of teaching site and rotation block, detailed information presented for 2006-07
indicated some variations by site or rotation block.
1. Final Course Grades:
1.1 At the course level, mean Final Grades were consistent over time with a range of 0.6% (80.4 - 81.0).
The decrease in standard deviation (4.2 to 3.5) indicated grades were clustering more around the
mean. For 2006-07, the percentage of honours grades decreased by eleven percent (65 to 54).
1.2 Final Grades by Teaching Site and Rotation:
The range was 3.2% in mean final grades for the five teaching sites (78.4 - 81.6) and six rotation
blocks (78.7 - 81.9) for 2006-07.
2. Multiple Choice Question Exam:
2.1 At the course level, mean MCQ Exam marks were consistent over time with a range of 1.9% (76.8 -
78.7). For 2006-07, the percentage of honours grades decreased by eleven percent (49 to 38).
2.2 Reliability:
The mean reliability alphas for six rotation exams per year were: .73, .64 and .60. As stated in the
report, the MCQ exam tests a variety of disciplines and subspecialties and more uniform consistency
of responses would not be expected across the varied content domains.
2.3 Construct Validity:
For 2006-07, there was a range of 8.3% in the mean MCQ mark by rotation (72.2 - 80.5 for
Rotations 1 and 2, respectively); as a raw score, this represented a difference between 54 and 60 out
of 75. The report suggested that the lowest mean mark might have been "a first rotation
phenomenon"; however, the mean mark for Rotation 6 was the second lowest, at 74.0.
3. Oral/Clinical Skills Exam:
3.1 At the course level, mean Oral/Clinical Skills Exam marks were consistent over time with a range of
0.8% (79.3 to 80.1). The percentage of honours was stable (54 - 55).
3.2 Reliability:
For the six rotation exams in 2006-07, internal consistency alphas for the Oral Exam were high for
both components of the structured oral: Checklist items (.75 - .91) and Global ratings (.87 - .93).
3.3 Oral/Clinical Skills Exam Marks by Teaching Site:
The range was 5.3% (76.9 - 82.2) in mean oral/clinical skills exam mark by teaching site in 2006-07.
4. Ward Evaluation:
30
4.1 At the course level, mean Ward Marks were consistent over time with a range of 0.5% (82.6 - 83.1).
The percentage of honours was stable (79 - 81). The decrease in standard deviation (3.7 to 3.2)
indicated grades were clustering more around the mean.
4.2 Ward Marks by Teaching Site:
The range was 3.9% (80.9 - 84.8) in mean ward marks by teaching site in 2006-07.
4.3 Individual Ward Performance Criterion Ratings:
From the 2006-07 UMEO summary bar graphs of mean ratings on the 18 criteria for the course
overall and by academy, observations include:
a. At the course level, four criteria have a relatively high mean rating (~ 4.5 out of 5.0) which
indicates that a large percentage of students received an "Outstanding" rating.
b. At the academy level, an additional five criteria have a relatively high mean rating (~ 4.5 out
of 5.0) at one academy in comparison to the two other academies.
D. Discussion:
The Phase I Medicine CRISES Report is a comprehensive overview of the system of student assessment.
The Appendix to the report included further statistics and breakdowns to inform the ESAC review.
Following the presentation, ESAC requested several points of clarification on the Oral Exam and Ward
Evaluation which were provided at a subsequent ESAC meeting.
Course level grade statistics for academic years 2004-05 to 2006-07 indicated results were consistent over
time with respect to mean final grades and grade components. Detailed results presented for 2006-07
indicated some variability by teaching site and rotation block.
Although the variation between sites was addressed, the test used in the analysis (Kruskal-Wallis) is based
on rank ordering and may not address the key issues. Based on observation of the data, the differences
between sites may be educationally significant regardless of whether they reach statistical significance and
therefore it is worthwhile continuing to collect meaningful data that will allow for between site
comparisons. In the future student marks on the ward evaluation should be compared using an ANOVA.
Concluding comments on each assessment component are presented below.
1. Multiple Choice Question Exam:
1.1 Construct Validity:
With respect to the variation in mean scores across rotation exams in 2006-07, the introduction of
the new MCQ questions developed in 2006 may have been a factor for two reasons: (a) the difficulty
level of the new items may have varied across exams, and (b) exams with a larger proportion of new
items (ranging from 10 to 20% on each exam, or 7 to 15 items) may have generated inconsistent
mean scores. To further investigate the variation in mean scores by rotation, an analysis by
content/specialty domain should indicate whether differences across rotation exams were specific to
a content area or generalized across content areas.
1.2 Exam Blueprint:
For the small number of questions included from seven areas outside the nine content/specialty
domains, there should be some consideration as to whether the MCQ format is adequately assessing
31
the knowledge base in these areas (e.g., ECG, chest x-rays). An option might be to assess these
areas through a section on diagnostic test interpretation on the Oral/Clinical Skills Exam.
2. Oral/Clinical Skills Examination:
2.1 Sub-components of the Oral/Clinical Skills Exam:
The following additional information was provided on request:
a. For the calculation of the mark, three sub-component scores are weighted as follows:
Structured Oral Exam Checklist score (.50) and Global ratings (.40), and score on three exam
maneuvers (.10).
b. With respect to the correlations between sub-components, correlations were considered to be
good and had been the topic of an earlier research study.
2.2 Oral/Clinical Exam Marks by Teaching Site:
For 2006-07, the mean exam marks by site ranged by 5.3%. To investigate site differences, an
analysis by each sub-component (checklist, global ratings, exam manoeuvers) could identify whether site
differences are component-specific.
Another factor for investigation might relate to characteristics of the
patients/clinical cases for the exam as a potential variation by teaching site.
3. Ward Evaluation:
3.1 The completion of the ward evaluation was described as a consensus process. Further information
indicated that while the ward evaluation process is standardized across sites in many aspects (e.g.,
formal feedback, input from staff and resident teachers), the consensus process may be variable by
site due to adaptation to site-specific conditions.
3.2 Ward Marks by Site:
For 2006-07, the mean ward marks by site ranged by 3.9%. The site-specific consensus process
could be a factor in site differences in overall ward marks and rating distributions of the individual
performance criteria contributing to the overall ward mark calculation.
To investigate differences in ward marks by site, a review process might be established to examine
the marks and performance ratings by site and determine whether further action would be required,
e.g., to increase consistency across sites, review the process at a system level and/or continue to
monitor the ward assessment ratings and statistics.
E. Suggestions and Recommendations:
1. Multiple Choice Question Examination:
1.1 Continue to monitor mark statistics across rotation exams, in particular when new test items are
being introduced.
1.2 Maintain the same proportion of new test items on each rotation exam.
1.3 In reviewing item statistics, consider screening items for different levels of difficulty to assess
whether each exam represents a similar balance of easy, moderate, difficult items.
32
2. Oral/Clinical Skills Examination:
2.1 Consider including a section on diagnostic test interpretation to incorporate areas that might be better
evaluated in this setting than the Multiple Choice Question format.
2.2 Review mark statistics at the level of sub-components (checklist, global and manoeuvers) to assess
whether site differences are component-specific.
3. Ward Evaluation:
3.1 Review the Medicine Phase I Ward Evaluation Form with department clerkship representatives and
site coordinators for all teaching sites.
3.2 Review each performance criterion and respective rating statistics to identify the level of information
being provided and the differences in rating patterns by teaching site.
3.3 Develop consensus on each performance criterion with respect to the purpose and function of each
criterion for the ward evaluation form.
3.4 Develop a standardized approach for completing the ward rating scale, communicate the approach to
teaching sites, and continue to monitor the implementation.
Recommendations for the ward evaluation to address grade inflation:
3.5 Promote reasonable standards for the ward rating scale.
3.6 Discourage or prevent the use of the highest rating of "Outstanding" as the default category.
3.7 Review the individual criterion weights for the calculation of the ward grade and the impact of each
criterion rating on the grade calculation to determine whether the weighting is optimal or requires
adjustment.
Conclusion of Review
Based on the CRICES report presented and the opinion of the ESAC committee members, it is our
conclusion that:
Continued improvement is encouraged . . . . . . Full review next cycle (approx. 3 years)
___________________________
Richard Pittini, ESAC Chair
cc: Vice-Dean, UME
______10/02/12_____________
Date
33
Examination and Student Assessment Committee Medicine Phase II (MED400Y) Clerkship Review
At the ESAC Meeting on December 4 2007, Dr. Danny Panisko, the Department of Medicine
Undergraduate Education Director, and Dr. Rajesh Gupta presented a comprehensive CRISES report on the
Medicine Phase II Clerkship student assessment system.
A. Components of the Student Assessment System:
The components of the assessment system are outlined below:
1. Short Answer Written Examination (30% weighting)
The Short Answer Exam consists of 16-17 questions and is 2.0 hours in length. For each exam, 2
questions represent each major subspecialty in medicine, including ethics and clinical pharmacology.
The exam blueprint is based on the course objectives. The exam content is taken directly from the
"orange booklet" which lists common and life-threatening problems and diseases that a Phase II
student is expected to know.
2. Objective Structured Clinical Examination (20%)
The OSCE consists of 8 stations and is 90 minutes in length. There is one examiner per station. Each
exam incorporates nine specialty areas and reflects the major content areas expected of a Phase II
Medicine student. Six stations are patient-based and two are written stations selected from three
areas: chest x-ray, ECG or clinical case.
3. Ward Evaluation (CTU) (25%)
The ward evaluation is based on the student's 2 to 3 week rotation on a Clinical Teaching Unit. This
evaluation is completed through a consensus process by the site coordinator with the residents and
staff who had supervised the student. The ward evaluation form is a standard design for all faculty
clerkship rotations with a common five-point rating scale (Unsatisfactory to Outstanding), specified
weights assigned to each level of performance, and 18 performance criteria listed according to the
CanMEDS Roles.
For the Phase II Medicine ward evaluation form, the rating on each of the 18 performance criteria is
weighted in the calculation of the ward evaluation mark. The specific weight of each criterion is
assigned by the course.
The completion of the ward evaluation was described as being a consensus process involving the site
coordinator with the residents and staff who had supervised the student.
4. Ambulatory Clinic Evaluations (15%)
This performance evaluation is based on the student's 2 to 3 week rotation on ambulatory clinics.
The evaluation form includes skills relevant to the ambulatory care of patients.
5. Written Assignments (10%)
There are two assignments each weighted 5%: a reflection exercise and a topic review.
B. Strengths of the Assessment System:
34
For Medicine Phase II, the strengths of the student assessment system as described in the CRISES Report
include:
1. The variety of evaluation methods that allows sampling of many CanMEDS skills;
2. The commitment to feedback in which each evaluation method includes a feedback system; and
3. The approach to borderline students, provision of extra work/remediation and follow-up.
C. Observations from the Assessment Data:
Course level grade statistics reported for academic years 2004-05 to 2006-07 were consistent in general. At
the level of teaching site and rotation block, detailed information presented for 2006-07 indicated some
variations by site or rotation block.
1. Final Course Grades:
1.1 At the course level, the mean Final Grades were consistent over time with a range of 0.7% (80.6
- 81.3). The percentage of honours grades increased by fifteen percent in 2005-06 and remained at this
level for 2006-07 (i.e., 52, 67 and 68).
1.2 Final Grades by Teaching Site and Rotation:
For 2006-07, the range was 2.7% in mean final grades for the five teaching sites (79.7 - 82.4). For
the five rotation block means, the range was 3.1% (80.1 to 83.2).
2. Written Short Answer Written Examination:
2.1 At the course level, the mean Short Answer Exam marks were consistent over time with a range of
1.5% (77.4 - 78.9).
2.2 Short Answer Exam Marks by Teaching Site and Rotation:
For 2006-07, there was a range of 6.5% in the mean exam mark by rotation (75.6 - 82.1 for
Rotations 2 and 1, respectively).
3. Objective Structured Clinical Examination:
3.1 At the course level, OSCE mark statistics were consistent over time with respect to the mean mark
(76.4 - 77.3), percentage honours (26 - 31) and standard deviation (5.2 - 6.0).
3.2 OSCE Marks by Teaching Site and Rotation:
For 2006-07, the range was 2.6% in the mean OSCE mark by teaching site (75.0 - 77.6).
By rotation, the range was 3.6% (74.6 to 78.2).
3. OSCE (continued):
3.3 Reliability:
Evidence of reliability of the OSCE Stations was indicated by the following analyses:
a. Correlations between Station Checklist and Global Scores were .37 to .70 (i.e., for 7 stations
in 2004-05, 8 stations in 2005-06)
b. For Global Ratings on all exam stations by academic year, the mean Cronbach's alpha was
.75 (2005-06) and .76 (2004-05, 2006-07).
35
c. For Global Ratings on each station in 2006-07, Cronbach's alpha reliabilities ranged from .55
to .83, a measure of the internal consistency of the ratings comprising the Global Score for
each station.
4. Ward Evaluation (CTU):
4.1 At the course level, the mean Ward Evaluation mark increased by about three percent from 2004-05
to 2005-06 and remained at this level for 2006-07 (i.e., 80.3, 83.0, 83.3). In 2005-06, the percentage
of honours grades increased by 20 percent (i.e., 61, 81, 81).
4.2 Variations by Teaching Site and Rotation:
For 2006-07, the range was 2.9% in mean ward marks by teaching site (81.6 - 84.5).
By rotation, the range was 4.1% (82.0 to 86.1).
5. Ambulatory Clinic Evaluations:
5.1 At the course level, Ambulatory Clinic Evaluation mark statistics were consistent over time with
respect to the mean (80 - 81), standard deviation (1.7 - 2.2) and overall mark distribution (9 - 11).
In 2005-06 the percentage of honours grades decreased by 17 percent and remained at this level for
2006-07 (i.e., 73, 56, 56).
5.2 Variations by Teaching Site and Rotation:
For 2006-07, the range was 2.0% in mean ambulatory mark by teaching site (78.8 - 80.8).
By rotation, the range was 1.7% (79.3 - 81.0).
6. Written Assignments:
6.1 Weighted a total of 10%, the written assignments do not represent a major component of the
assessment system. Statistics on the assignments were not required for the CRISES report.
By extrapolation from the grade data for the major components, the mean Assignment mark was
estimated to be about 96%.
The Phase II Medicine CRISES Report is a comprehensive overview of the system of student assessment.
The Appendix to the report included further statistics and breakdowns to inform the ESAC review.
Course level grade statistics for academic years 2004-05 to 2006-07 indicated results were consistent over
time with respect to mean final grades and grade components with the exception of the ward evaluation.
Detailed results presented for 2006-07 indicated some variability by teaching site and rotation block.
Validation of evaluation methods requires ongoing collection of meaningful data to facilitate between site
comparisons. In the future, analysis of student marks on the ward evaluation should be compared between
sites using an ANOVA.
Concluding comments on three assessment components are presented below.
1. Ward Evaluation:
1.1 In 2005-06, the Ward Evaluation mean mark increased by 3% and the proportion of honours by
20%. This trend continued in 2006-07. The calculation of the mark is based on the ratings on the
36
eighteen individual performance criteria and, thus, the distribution of the performance ratings
appears to have changed in some respects.
1.1 The completion of the ward evaluation was described as a consensus process. Further
information indicated that while the ward evaluation process is standardized across sites in many
aspects (e.g., formal feedback, input from staff and resident teachers), the consensus process may be
variable by site due to adaptation to site-specific conditions.
1.3 Ward Marks by Site and Rotation:
For 2006-07, the ward mark data presented by teaching site and rotation showed a difference of 3%
in means by site and 4% in means by rotation block. The site-specific consensus process could be a
factor in site differences in overall ward marks and rating distributions of the individual performance
criteria contributing to the overall ward mark calculation.
To investigate differences in ward marks by site, a review process might be established to examine
the marks and performance ratings by site and determine whether further action would be required,
e.g., to increase consistency across sites, review the process at a system level and/or continue to
monitor the ward assessment ratings and statistics.
2. Ambulatory Clinic Evaluations:
2.1 The Ambulatory Clinic Evaluation mark statistics were very consistent over time, indicating a stable
process. It should be noted that the marks for this assessment component have a narrow mark
distribution and little differentiation between students. It may be useful to review this component to
determine whether the assessment from the clinic setting is providing the intended information.
3. Written Assignments:
3.1 For the two written assignments, the mean mark was estimated at about 95%; thus, it appears
that this mark is a „bonus‟ 10% for most students. If this mark were removed, the mean grades would be
lower; however, the value of these assignments for student learning may outweigh the problem of
contributing to grade inflation. Student and teacher feedback on the assignments may be useful in
reviewing the value of this component of the student assessment system.
Suggestions and Recommendations:
1. Ward Evaluation (CTU):
1.1 Review the Medicine Phase II Ward Evaluation Form with department clerkship representatives/site
coordinators for the different teaching sites.
1.2 Review each performance criterion and respective rating statistics to identify the level of information
being provided and the differences in rating patterns by teaching site.
1.3 Develop consensus on each performance criterion with respect to the purpose and function of each
criterion for the ward evaluation form.
1.4 Develop a standardized approach for completing the ward rating scale, communicate this approach
to teaching sites, and continue to monitor the implementation.
37
Recommendations for the ward evaluation to address grade inflation:
1.5 Promote reasonable standards for the ward rating scale.
1.6 Discourage or prevent the use of the highest rating of "Outstanding" as the default category.
1.7 Review the individual criterion weights for the calculation of the ward grade and the impact of each
criterion rating on the grade calculation to determine whether the weighting is optimal or requires
adjustment.
2. Ambulatory Clinic Evaluations:
2.1 Consider a review of the Ambulatory Clinic Evaluation component to ensure there is consistency
between the objectives of the evaluation, the process and the criteria.
3. Written Assignments:
3.1 Review the value/strengths of the written assignment component as implemented now.
3.2 If it is confirmed that most students receive full marks (10%), consider determining the effect on the
overall grade statistics and reassess whether changes are required e.g., to adjust the marking system,
replace the component, or reassign the 10% to the other components of the assessment system.
Conclusion of Review
Based on the CRICES report presented and the opinion of the ESAC committee members, it is our
conclusion that:
Continued improvement is encouraged . . . . . . Full review next cycle (approx. 3 years)
___________________________
Richard Pittini, ESAC Chair
cc: Vice-Dean, UME
______10/02/12_____________
Date
38
Faculty of Medicine University of Toronto
Undergraduate Medical Education
Examination and Student Assessment Committee (ESAC) Course Review: OBS/GYN (OBS 300)
Course Director: Dr. F Meffe
Reviewers: Dr. P. J. Morgan and Ms. K. Hershenfield
The course director, Dr. F. Meffe, presented the review of the OBS 300 course on Tuesday, May 3, 2005.
OBS 300 is a 6-week course in phase I of clerkship. Students spend six weeks as a member of a clinical
team taking part in the care and study of women who present to one of the teaching hospitals. Students are
expected build upon their obstetrics and gynecology knowledge from „Foundations of Medical Practice‟ to
understand, appreciate, and apply the knowledge, skills and attitudes required for residency.
There are several methods of evaluation used in this course. The evaluations consist of a written
examination worth 33.3% of the mark, an oral examination worth 33.3% of the mark, and a ward evaluation
worth 33.3% of the student‟s mark.
The written examination consists of a multiple choice question component, worth 25% of the examination,
and a short answer component, worth 75% of the examination. The exam consists of 20 multiple-choice
questions and 30 short answer questions. The multiple-choice component is computer-scored and the short
answer component is marked by only one rater at each site. The written examination class average in the
past academic year was 81.81% with 65.8% of students receiving an honours grade. On the multiple-choice
portion, the class average in the past academic year was 81.30% (73.2% of students receiving an honours
grade). For the short answer component, the class average in the past academic year was 81.96% (65.3% of
students receiving an honours grade).
The oral examination consists of 4 history/physical stations. Students rotate through the stations in a 60-
minute period. There is one rater for each oral examination question per site. The rater uses provided
guidelines to mark candidates for a particular scenario. The class average in the past academic year was
82.63% with 71.6% of students receiving an honours grade.
The ward rating is based on participant‟s daily clerkship encounter forms. Students are expected to submit
approximately 10 daily encounter forms throughout the rotation. Only one rater for any one clinical
encounter completes the daily clerkship encounter form. Raters can include residents and/or staff/faculty.
The daily clerkship encounter form was modified for the 2004-2005 academic year, such that now raters do
not record an actual percentage mark, but rate a competency on a scale from unsatisfactory to outstanding.
The information provided on the encounter forms is used by the clerkship coordinator or the student‟s
mentor to complete the final clerkship evaluation. The final evaluation is a consensus of individual clerkship
encounter forms. Therefore, the ward rating includes input from at least 7-10 raters. The class average on
39
the ward evaluation in the past academic year was 82.40% with 80.0% of students receiving an honours
grade.
The overall class average for the past 3 years was 2003-04, 82.16%; 2002-03, 82.9%; 2001-02, 83.36%. The
proportion of students receiving an honours grade was lower than 80% only in the 2003-04 academic year,
at 73.2%.
Areas of Strength
This course has multiple methods of evaluation which are weighted towards the final mark.
1. Written Examination
The written examination has good content validity as both MCQ and short answer questions are
chosen to reflect seminar objectives and syllabi and are representative of the wide spectrum of course
content. The 20-item MCQ portion has high internal consistency as since 03-04 the exam has been centrally
computer scored. There is also a concerted effort to reduce the percentage of overlap from one examination
to the next.
2. Oral Examination
It was felt that the oral examination was a good assessment tool since the questions focused on
management issues which the students felt to be valuable.
3. Daily Clerkship Encounter Form
The daily encounter forms allow multiple staff/residents to contribute to the final evaluation. The
form was revised in 2004-2005 to better reflect the final clerkship evaluation form and the competencies
expected by students.
Areas for Improvement and Recommendations:
1. Written examination
a. Dr. Meffe identified the fact that there was a problem with adequate questions in the
database. Some suggestions for solving this issue included holding a workshop for question
generation or contacting the Society of Obstetricians & Gynecologists of Canada (SOGC) for
the possible development of a large scale database for questions. Another option would be to
eliminate MCQs altogether and focus only on short answer questions.
The course director might consider limiting the number of raters marking the short answer questions to
improve reliability of marking system. Alternatively, divide the examination by question rather than site
for the purpose of marking. Having all students take the written examination at one site might facilitate
this approach. Predictive validity of the MCQ exam could be determined by comparing the results with
the MCCQE data.
40
2. Oral Examination
a. There are multiple raters at each site and there is no evidence about the reliability of the
marking system. The idea of developing an OSCE for this course was discussed but the cost
was felt to be prohibitive.
A formal assessment of the inter-rater reliability of the oral examination markers might be useful.
Consideration of including both a checklist and global rating score for the oral exam was suggested.
Central marking of this format of examination would be feasible.
3. Daily Encounter Forms
a. While it was generally felt that these were useful evaluation tools, a few concerns arose. It
appears that there can be a wide range of what skills are actually assessed on these daily
encounter forms and that the students generally have free rein to select the encounters that
they wish to have assessed. There also may be a variable number of encounter forms
submitted i.e.) not all students will necessarily submit 10 forms. It is also possible that no
technical skills would actually be assessed. It was also mentioned that some of the encounter
forms were ``weighted``. It was unclear as to how or why this would occur.
A new clinical encounter form with guidelines as to the minimum number of expected clinical skills as
well as the development of a marking template would enhance this evaluation tool. The development of
a marking template may facilitate easier transfer of the evaluations onto the final clerkship evaluation
form. This should be kept in mind during the development of this template. Final grades obtained from
the template could be adjusted using narrative comments.
4. Feedback
a. The students feel that they are receiving feedback to a large extent from residents, especially
at the midway point.
Achieve the goal of a formal midpoint feedback by faculty by ensuring that all site coordinators
comply with this requirement. Highlighting for students that they are receiving feedback will
ensure that they are aware of this having occurred.
Conclusion of Review
Based on the CRICES report presented and the opinion of the ESAC committee members, it is our conclusion that:
Continued improvement be encouraged .......................Full review next cycle (approx. 3 years)
___________________________ Richard Pittini, ESAC Chair
___________________________
Date
41
Faculty of Medicine University of Toronto
Undergraduate Medical Education
Examination and Student Assessment Committee (ESAC) Ophthalmology Review: Catherine Birt MD, Course Director
Review by: Dr. P. J. Morgan
Ophthalmology is a 1-week, course that shares a 6-week block with ENT (1 week) and Family Medicine (4
weeks) in Phase I of the clerkship.
The course director, Dr. C. Birt presented the review of the Ophthalmology course on Tuesday, November
4, 2003. Dr. Birt presented an overview of the course. The didactic component of the course is composed of
8 lectures which are given at each hospital, Mount Sinai, Toronto Western, St. Michael‟s, Sunnybrook and
as well as a ½ day at the Hospital for Sick Children. The audiovisual component of the lectures are available
on the Ophthalmology website. The lectures cover different topics which are used as material for the
examination. On the 2nd
Friday of the 2-week ENT/Ophthalmology rotation, students take a practical
examination where pairs demonstrate certain skills, a list of which are found in the course manual. On
Friday afternoon, all students take a written examination held at Medical Sciences Building. The written
examination comprises 65% of the final mark, the demonstration of clinical skills is worth 25% of the final
mark and the ward assessment comprises 10% of the final mark.
The written examination is composed of 8 short answer questions worth 5 marks each for a total of 40
marks. Of the 8 short answer questions, 3 are based on slides which are projected for about 60 secs and
cannot be retrieved for repeat viewing. There are 10 multiple choice questions each worth 1 mark. The
composition of the written examination is therefore weighted 80% for short answer (40 marks) and 20%
multiple choice questions (10 marks).
Six undergraduate ophthalmology committee members create the examination database and update
questions. One committee member creates each examination and all are reviewed by the course director who
assesses the face validity of the questions. There is a standardized layout of the examination and each
examination has 10-20% new questions. The examination is marked by one member of the committee. An
overall mark of 60% is required to pass the examination and a combined overall mark of 60% is required to
pass the course.
OSCE
The OSCE is comprised of 5, 5 minute stations which may differ from site to site with each hospital
examiner determining which skills will be evaluated. Students are given a list of what skills/maneuvers they
42
may be asked to perform. The OSCE score is based on faculty opinion of how the skill was performed and
does not have a formal scoring template. Two or 3 faculty per site oversee the OSCEs which contribute 25%
of the student‟s final mark.
Ward assessments form 10% of the final mark. Students are evaluated by 1-3 housestaff and/or faculty who
assign marks based on informal assessments of the students‟ performances while in the clinic. Essentially
students usually get 80% if they have attended the clinic component of the course. There is a variable
marking system with no associated algorithm or template for marking since it is difficult for housestaff and
faculty to get to know students for such a brief period of time.
The ophthalmology and ENT exams are done on the same day in a 1-hour period.
There is no determination of item statistics for any evaluation component. No student has ever failed the
OSCE but failure of this component would not necessarily mean failure of the course. Students who have
received <70% on 2 components of the course are brought to the attention of the Clerkship Director and
ultimately the Clerkship Committee.
Marks are posted on the ListServ with each student receiving a designation of HPF for each component and
an overall grade. Course and faculty evaluations are completed by the students at the time of the written
examination. One to 2 students request remarking of their written examination per year.
The overall class average for the course has remained relatively stable over a 5 year period with an
increasing percentage of students receiving an honours mark. Overall, 76% of students received honours in
the course in the 2002-2003 academic year. There were no failures in a 5 year period and an average mark
of 81-84. The clinical skills marks mirror the overall mark results with one failure in the 2000-2001 year.
No data on inter-rater reliability was presented. There did not appear to be a difference in averages marks
between the components for each rotation. There are no analyses of internal consistency. Again, the
comparison of marks between academies lists the average mark for the clinical skills component. There are
no data presented with respect to construct or predictive validity for the clinical skills component. Content
validity is assessed by the course director. The class average for the OSCE component is between 83 and 85
for the past three academic years with nearly 90% of students achieving honours in this component in 2002-
2003. There was a wide range of marks with the low mark ranging from 60-68. Comparison of marks
between academies is presented but again, the analysis has not been identified.
With respect to feedback, it is given by sending out marks on the ListServ. Students may go to the Course
Director if they wish to discuss their evaluation. Weak students are informed of their performance and are
encouraged to do an elective in the subject. There is no mid point feedback since the course is only 1 week
duration.
The appendix outlines correlation coefficients for the varying components. There is little to no correlation
between the clinical skills components and other evaluation methods used. There was significant correlation
between both the OSCE and written examination and the final mark. Histograms of the various components
demonstrate a fairly normal distribution pattern.
Areas of Strength:
1. Written Examination
The course has developed a secure database that presents 10-20% new questions per examination.
43
2. Component Evaluation
All component evaluations have a range of marks with a normal distribution on histograms.
3. Course Evaluation
There is a good method of attaining both faculty and course evaluation feedback.
Areas for Improvement and Recommendations:
1. Written Examination
There appears to be some concern from the students that the slides that are presented remain projected for a
short period of time only.
Recommendations:
The course director could pursue alternate methods of projection that may allow students longer access
to the slide.
The committee had some comments on the need for both MCQ and short answer questions.
Recommendations:
There is a need to determine the item statistics for this evaluation component. The Course committee
should also consider including more short answer questions.
2. Ward Assessment
Since the students generally receive 80% for “just showing up”, the ward assessment is of relatively limited
discriminatory value.
Recommendations
Make the clinical time a mandatory part of the course but not a component that receives a grade.
Increase the weighting of the clinical skills component by 10%
3. OSCE Marks
There is a very high percentage of students obtaining honours is this course. This may reflect the relatively
subjective ward assessment and the limited number of skills that can be presented in the OSCE component.
Since ophthalmology is a short course, it is difficult to develop a large selection of skills/maneuvers that can
be tested.
Recommendations
Develop standardized checklists or a marking template to ensure consistent marking between sites for
this component.
4. Feedback
Due to the brevity of the course, it is difficult to give feedback to students.
Recommendations
Some suggestions to consider would be: to include comments along with the ListServ distributed mark
which indicates whether the student‟s performance was below expectations, meets expectations or above
expectations, focused faculty development to emphasize the importance of daily informal oral feedback,
provide written comments on a tear off sheet at the end of the exam.
44
Conclusion of Review
Based on the CRICES report presented and the opinion of the ESAC committee members, it is our
conclusion that:
Written submission regarding addition data analysis (Item total correlations for a sampling of written exams,
correlation coefficients between component marks between academies) and response to above
recommendations required ..........................................................................Interim Review in 6-months time
___________________________
Richard Pittini, ESAC Chair
___________________________ ____________________________
Catherine Birt, Course Director Date
Distribution: Course Director
ESAC File
Clerkship Coordinator
UME-CC
45
Faculty of Medicine
University of Toronto
Undergraduate Medical Education
Examination and Student Assessment Committee (ESAC) Addendum to the Report on Ophthalmology
September 5 2006 This addendum completes the ESAC review of the student assessment system in Ophthalmology, a one-week rotation taken as part of the six-week block with Family and Community Medicine (four weeks) and ENT (one week) during the Phase 1 Clerkship. The Course Director for Ophthalmology, Dr. C. Birt, completed the CRICES form (i.e., Criteria for Review of Individual Course Evaluation Systems) and presented the course report at the ESAC meeting of November 4, 2003. A review for Ophthalmology was prepared by the lead reviewer, Dr. P. Morgan. The written examination question database is maintained and updated by members of the undergraduate course committee. A new examination is developed for each of the six rotation exams per year and the content validity of the exams is assessed by the course director. For each rotation exam, all papers are marked by a single marker. In the CRICES report, grade statistics were reported for the overall final grade and the three grade components. In the appendices were tables of correlation coefficients and breakdowns of mean component grades by rotation and teaching site. This addendum was prepared in response to the review recommendation to examine item statistics for the written exam. The Course Director and UMEO professional educator arranged for an analysis to be conducted on all exam papers for one rotation block selected at random. Normally the written exam results are recorded as aggregate scores without the individual item data. In ophthalmology, individual item data were not stored electronically in the written exam results database. To conduct an item analysis, the item data had to be retrieved from the original exam papers. Due to time and cost constraints, the analysis was limited to the exam for one rotation block that was randomly selected. A data file was created with the mark assigned per question for each of the 33 exam papers. The written examination includes three question formats: (i) three short answer questions based on projected slides (15 marks), (ii) five short answer questions without slides (25 marks), and (iii) ten multiple choice questions (10 marks). Each format was reviewed separately. In general, the analysis found that for each format the items ranged in difficulty and resulted in a large distribution of marks. For the short answer formats, item means ranged from 3.1 to 4.1 out of 5 (slide questions) and from 3.7 to 4.6 (no slides), and total scores correlated highly with the overall format score. For the multiple choice questions, item difficulty ranged from 27 to 100% and correlation coefficients for 7 items ranged from .17 to .61. Item statistics for each format are presented in Tables 1 to 3. Overall, each item format resulted in mean scores from 73 to 84 percent. Summary statistics at the level of each format are presented in Tables 4.
46
Our interpretation of this data is in the context of a small sample size and our intent is to examine for trends or general issues rather than to make specific conclusions. It appears that the item total correlations are acceptable. A relatively high proportion of the questions were answered correctly by 100% of students. This is not a problem when there are a large number of questions but when you use a relatively short examination this can affect final grades. We would not propose that you change the examination based on this limited review but rather we suggest you develop and maintain a database which will allow you to do this type of analysis on an ongoing basis. Resources are available to assist you with the development of such a database. Conclusion of Analysis Based on the CRICES report presented, the information contained in this addendum and the opinion of the ESAC committee members, the ESAC review of Ophthalmology was concluded with …. Continued improvement be encouraged .............................. Full review next cycle (approx. 3 years) ___________________________ Richard Pittini, ESAC Chair ___________________________ Date
47
Faculty of Medicine University of Toronto
Undergraduate Medical Education
Examination and Student Assessment Committee (ESAC) Otolaryngology Review: Paolo Campisi MD, Course Director
Review by: Dr. R. Pittini & Dr. S. Bernstein
Otolaryngology is a 1-week, course that shares a 6-week block with Ophthalmology (1 week) and Family
Medicine (4 weeks) in Phase I of the clerkship.
The course director, Dr. P. Campisi presented the review of the Ophthalmology course on September 4,
2007. On the 2nd
Friday of the 2-week Otolaryngology/Ophthalmology rotation, students take an
examination. The ophthalmology and otolaryngology exams occur on the same day.
Students are evaluated with a short answer written examination, a two station OSCE examination and an
ambulatory assessment based on direct observation by faculty during six half-day clinics.
Components
The written examination is composed of 20 short answer questions worth 60% of the final grade. The
Otolaryngology committee members create the secure examination database that was last updated two-five
years ago. The course director who assesses the face validity of the questions reviews all questions. One
member of the committee marks all of the examinations for one iteration of the exam per year while the
course director marks two iterations.
The OSCE is comprised of 2 stations, each ten minutes in duration worth a total of 20% of the final grade.
A single examiner examines students for both stations. The students are examined at the same site that they
receive their instruction. Examiners use a detailed checklist to evaluate student performance which the
students may demonstrate or describe. Some examinations include models while others include a
description of what would be done if there were a patient. The OSCE examiner is the same individual who
completes the ambulatory assessment. The OSCE marks are consistently higher than the written exam
marks.
Ambulatory assessments form 20% of the final mark. Site coordinators assign marks based on informal
assessments of the students‟ performances while in the clinic. Students attend six half-day clinics.
Evaluation information for each clinic is collected on paper based „green cards. The site coordinators do not
always have the opportunity to observe students directly and rely on comments from other teaching faculty.
48
According to the course director, students essentially get 80% if they have attended the clinic component of
the course. Prior to the ambulatory assessment the OSCE is completed.
The overall class average for the course has remained relatively stable over a three-year period with an
increasing percentage of students receiving an honours mark. Overall, approximately 50% of students
received honours in the course. There were no failures in a three year period. No data on inter-rater
reliability was presented. There did appear to be a difference in averages marks between the individual
rotations for year 2006-2007. There are no analyses of internal consistency. Content validity is assessed by
the course director.
With respect to feedback, students may petition the course director through the Undergraduate Medical
Education office. Weak students are informed of their performance and are encouraged to do an additional
project in the subject. There is no mid point feedback since the course is only 1 week duration. A midpoint
quiz with provision of correct responses is being considered as a feasible method of providing formative
feedback.
Observations:
The number of questions included in the written examination is small relative to the weighting given to the
overall exam. The examination pool needs to be expanded with new questions added on a more regular
basis. Each iteration of the examination should include the same proportion of new questions. Some of the
examination questions are complex and could be simplified into several separate questions. This is an easy
method of increasing the exam pool size.
The examination scores appear to vary according to who marks them. The inconsistency in written
examination scores across different rotations reached statistical significant according to the one-way
ANOVA for the sample year 2006-2007. This may be the result of differences between evaluators and can
be avoided by dividing the examination into sections and have one marker evaluate all questions in one
section. Avoid having the same examiner for all three components and identify students by student number
only for the purpose of marking.
While additional OSCE stations may be beneficial from a psychometric perspective, feasibility constraints
make the choice of two stations reasonable. Given the small number of stations, the quality of each station
and its evaluation is more critical. The ideal station is standardized with respect to exam content (e.g. use
of mannequins) and examiner. The OSCE as a performance-based evaluation requires that skills be
demonstrated and not simply described. Ensure equal access to the mannequin to be used for evaluation
such that either all or no students have access to it.
The standardized forms define what criteria are to be used to assess students. Site coordinators can
compile the green cards to generate a global evaluation that reflects the opinions of the faculty who directly
observe the students. Faculty need to be discouraged from assigning marks based on attendance alone.
Faculty development should emphasize that the criteria outlined on the „green cards‟ be used consistently
for all students at all sites.
The completion of the OSCE evaluation prior to the assignment of an ambulatory mark by the same
individual can potentially lead to skewing of ambulatory marks based on OSCE performance. Using faculty
from different sites to mark OSCEs would eliminate this problem (e.g. have students from St. Michael‟s go
to HSC for their OSCE).
49
The brief duration of this course creates a challenge in providing students with meaningful feedback. In
addition to the proposed innovative feedback method, e-Log may provide another form of objective
feedback. E-Log may also be useful in determining which experiences are common to all students, these
and only these should be included in the evaluation. Encounter cards should be reviewed in a systematic
fashion in order to provide students with formative feedback from their tutors.
Areas of Strength:
3. The written examination is well aligned with the course objectives and the topics are evaluated
proportionately to their coverage in the course. Face validity appears to be good.
4. The OSCE provides an opportunity to objectively evaluate student performance in a procedural-
based rotation
5. The innovative method of feedback proposed (midpoint mini-quiz) is supported by this committee
Recommendations:
1. The written examination
a. The written examination pool needs to be expanded
i. Add 10-15% new questions per year
ii. Split existing complex questions into simpler components
b. Each examination should be divided into sections that are marked by one faculty
2. OSCE
a. Standardize OSCE across all sites so that mannequins are used by students to demonstrate
skills rather than describe procedures
b. Consider inclusion of three categories on the OSCE checklist (not done, done incorrectly,
done correctly).
c. Consider using one of several available techniques for standard setting among your
evaluators
3. Ward assessments
a. Adjust timing of the completion of this form to ensure that it occurs independent from the
OSCE evaluation
b. Encourage the use of specific criteria when evaluating students
i. Continue to use „green‟ encounter cards
ii. Provide faculty development aimed at this
Conclusion of Review
Based on the CRICES report presented and the opinion of the ESAC committee members, it is our
conclusion that:
50
No major issues, ongoing improvements encouraged ................................Full review in three years time
___________________________
Richard Pittini, ESAC Chair
___________________________ ____________________________
Paoli Campisi, Course Director Date
Distribution: Course Director
ESAC File
Clerkship Coordinator
UME-CC
51
Examination and Student Assessment Committee
Review of
Paediatrics Clerkship (April 29, 2004)
Background / Context
The paediatrics clerkship is a 6 week rotation in the third year of the undergraduate medical curriculum. As with other
clerkships in the third year, approximately 30-35 students rotate through the clerkship in each of 6 rotations over the
year.
For the last two years, two streams have existed within the rotation, with approximately half of the students in each
rotation participating in each stream. In one stream, the entire 6 week rotation is spent in the context of a single
community setting. In the other stream, the students spend 3 weeks in the Hospital for Sick Children and 3 weeks in a
community setting. For this second stream, half the students are at HSC for the first three weeks and half of the
students are in the community setting for the first three weeks. Other than this systematic variety in clinical setting,
the program for all students is similar and involves one day a week devoted to an academic teaching program at The
Hospital for Sick Children, which includes a half day for a seminar series, and a half day for case-based rounds.
Three forms of evaluation are used to generate the final rotation mark: a written test (worth 40% of the final mark), a
short written assignment related to Project CREATE (worth 10%) and a clinical mark based on performance in the
clinical settings (worth 50% of the final mark). The score generated from this weighted sum is translated into
honours/pass/fail for the purposes of transcription. In addition, it is necessary to pass both aspects of the evaluation
system in order to pass the rotation. There is also a pass/fail requirement to perform a physical examination.
The written examination is administered to all clerks at the end of each 6-week block. This examination is composed
of xx short answer questions. Each question per rotation is marked by a single examiner from a pre-existing marking
template to reduce examiner error in interpretation. No inter-rater reliability or internal consistency statistics are
currently available to assess the reliability of the marking of questions. Inter-rater reliability assessment would require
that two examiners score the same questions, which may be difficult given the constraints on faculty resources.
However, internal consistency measures should be relatively easy to calculate and are strongly recommended. Student
feedback regarding this aspect of the evaluation system was generally quite positive. Reports from the students
suggest that the written exam is perceived to be of appropriate difficulty and representative of the course objectives
and course content. Wording of questions was overall very fair and of good quality, and while some questions were
seen to be repetitive from previous years, students felt that this did not impair their evaluation.
The short written assignment is generally marked by one of the student‟s clinical supervisors. No inter-rater reliability
analyses are currently available to assess the consistency with which these assignments are marked. However, scores
are generally quite high with little variance, and are worth relatively little in the total score of the students, so
reliability assessment of these scores should be considered a relatively low priority. Student feedback suggests that,
although a sample was provided, the evaluation guidelines for this assignment were somewhat unclear. Some effort to
clarify these guidelines to the students (perhaps by providing the students with the guidelines given to the faculty)
might be helpful.
The clinical mark is generated by the team of supervising clinical faculty at the end of each clinical experience and is
based on daily interactions between students and faculty in the clinical context. For students who are in the single
community setting stream, this results in a single evaluation form being completed that constitutes the student‟s
clinical mark for the rotation. For students in the two-setting stream, a form is completed at the end of each 3-week
rotation and the clinical mark for the rotation is calculated as an unweighted average of the two forms. The correlation
between the two marks generated for the students with two placements is generally quite low (ranging from .08 and
.34 in the thee years under evaluation). Students have reported a general feeling among the student body that that the
stream including a rotation at HSC may be disadvantageous because the HSC clinical experience is “more difficult
52
than the community setting” and one‟s “clinical mark will suffer.” There is certainly no evidence of this phenomenon
in the data from the last two years. In fact, no substantial difference appears to arise in the scores of students who
participate in one stream or the other on either the clinical marks (where the marks were 81.67 in the community
stream vs. 81.76 for the mixed stream in 01/02 and were 81.60 vs. 82.45 respectively in 02/03) or written marks
(79.14 vs. 78.16 and 77.21 vs. 77.94 for the two years). However, the perception exists and mechanisms might be
enacted to counter these “rumours”. (Note: there does appear to have been a difference in the 00/01 year, however, in
is in the opposite direction to that suggested by the perceptions of the student body, and there were very few students
in the community stream in that year, limiting the capacity to make reasonable generalizations). Students also reported
some concern regarding the lack of explicit feedback from the faculty regarding their progress in the clinical setting
over the course of the rotation. This concern was exacerbated in the community setting where no formal mid-rotation
evaluation is provided, which “made the ward evaluation seem unfair (and not as good a learning tool, because can't
improve when only evaluated at the end).” Increasing the level of informal daily feedback from preceptors should
certainly be encouraged, however it might also be possible to enforce a formal evaluation in the 6-week stream to
mirror the mid-rotation evaluation that is an institutionalized aspect of the 3/3-week stream.
Students generally reported being quite happy with the requirement of an observed history and physical, and were
happy that this was evaluated as credit/non-credit since some supervisors were felt to be far more strict than others in
the implementation of this requirement (for example, some students were observed from start-finish of a patient
history/physical, whereas some students are not supervised but just questioned about their history/physical when they
were finished). Given the perceived lack of standardization around this component of the evaluation, it was felt that
credit/no-credit was an appropriate evaluation mechanism.
Strengths of the Evaluation System
The use of multiple sources of evaluation to generate a mark for the student is appropriate and consistent with the
intent of the University and the recommendations of ESAC. The weighting of the ward mark (at 50%) is somewhat
higher than the generally recommended value suggested by ESAC (40%), especially given the that the relatively
low correlation across placements confirms the common finding that these marks are quite unstable, but the
weighting is not unreasonably large.
The students appear generally to be quite happy with the mechanisms of the evaluation system, viewing it as
largely reasonable and fair from the perspective of the scoring and weighting of various aspects of the evaluation
process.
Areas for Potential Concern or Improvement in the Evaluation System
There is relatively little psychometric analysis of the marks generated for the students. While there are clearly
efforts at quality assurance in the production and marking of the written examination, statistical analysis is strongly
recommended in the form of inter-rater reliability and internal consistency measures in order to ensure the quality of
the examination.
There appears to be relatively few opportunities for explicit feedback on the written examination. This is, in part, a
natural consequence of the “closed” nature of the examination, since particular answers to particular questions
cannot be discussed extensively without compromising the validity of future examinations. However, this concern
has been further exacerbated by the current implementation of the honours/pass/fail transcription system, such that
students now also get only a very global sense of their level of performance on the examination. Efforts to increase
the substance of the feedback to students regarding their knowledge level and areas of potential concern are strongly
encouraged. For example, if questions can be sorted into three or four content areas, then areas of strength and
weakness in the individual student‟s knowledge base might feasibly be discussed. In addition, one course provides a
tearoff sheet at the end of the examination that allows the marker to provide some short written feedback to the
students about areas of strength and weakness and particular misconceptions. These types of comments do not need
to be directly related to the questions (thereby protecting the closed nature of the examination), but could
nonetheless provide useful personal information for the students. These are just a few suggestions, and ESAC would
encourage the course committee to consider developing additional methods as well.
53
Similarly, there are some concerns raised by the students regarding opportunities for formal feedback in the clinical
rotations, especially in the 6-week community stream where there is no 3-week evaluation. The course committee
might consider a mechanism for a formal mid-rotation evaluation to match that generated in the 3/3 stream, and to
encourage daily feedback to students by the preceptors.
Recommendations
Further analyses examining the inter-rater reliability of the clinical evaluation forms across the two clinical settings
for those who are in the split stream would provide interesting information about the reliability of the clinical mark
that is generated as 50% of the students‟ marks. It is recommended that these analyses be performed. Members of
ESAC are available for consultation on this recommendation if the course committee feels that such consultation
would be helpful.
Analyses regarding the internal consistency (eg, Cronbach‟s alpha) of the written examination are strongly
recommended. The identification of questions that are clearly not psychometrically sound could be given special
attention as part of the examination review process. Also, some effort at inter-rater reliability for the marking of the
written examinations would be recommended.
Special attention should be paid to increasing the amount of feedback to students regarding their performance on the
written examination and clinical performance. Recognizing the closed nature of the written exam and the current
interpretation of the policy regarding the release of numeric grades, it is nonetheless important to give students
feedback regarding their relative strengths and weaknesses in the knowledge domain as well as in the clinical
domain.
Mechanisms might profitably be implemented to provide information to students regarding the relative equivalence
of the two streams of clinical training in order to ameliorate concerns regarding disparate marking practices.
Rating
No serious problems are evident and no anticipation of serious problems developing in the near future. Next review in
3-4 years.
Note
This evaluation is forwarded to:
Chair of the Undergraduate Medical Education Curriculum Committee
(and Associate Dean Undergraduate Education)
Chair of the Clerkship or Preclerkship Committee
Course Director
and kept on file by ESAC.
____________________________________________________________
Signature of Course Director
____________________________________________________________
Signature of Chair, Examination and Student Assessment Committee
Thank you for participating in the ESAC Review Process!
54
Faculty of Medicine University of Toronto
Undergraduate Medical Education
Examination and Student Assessment Committee (ESAC) Pathobiology of Disease (PBD 211F)
Course Director: William Chapman Report: January 6, 2004 (original date) with revisions Nov. 7, 2006 Lead Reviewers: R. Gupta & R. Pittini Preamble The Pathobiology of Disease (PBD) report was presented to ESAC by the Course Director, Dr. William Chapman on January 6, 2004. The reviewers wish to acknowledge the long delay in finalizing this report. Recommendations to the Course Director do not take into consideration any changes to the course that may have been made in the interim. Background This course is a 14-week course which starts at the beginning of second year and it is the heaviest caseload in second year at that time. The course consists of 14 problem-based learning sessions, one per week, and 9 seminars in microbiology/immunology/genetics. The objective of the course is to bridge the basic and clinical sciences. The three examinations during this course are evenly spaced and equally weighted. The first examination is a 50 item of multiple choice question examination. The second examination is a combination of short answer and multiple choice question examination. There are 34 MCQ items worth one mark each and 9 short answer questions worth 26 marks. Eight of the short answer questions are genetics based and one is a question regarding an ethical issue. Examination 3 is a 60 item multiple choice question examination. The material tested by each examination is not cumulative. Students regularly make presentations during the course, but the tutor does not evaluate performance on these presentations. The examinations are set by the Course Director and questions are obtained from the lecturers. New questions are requested each year and there are minor instances of repeat questions. Students have access to previous year’s questions as examination question booklets are not collected at the end of each examination. The exam items are informally reviewed both prior to
55
and following the examination. Approximately two questions are deleted each exam based on student feedback. Overall, students perform very well on this course with a class average of 84%, 83% and 86% in the years 2003, 2002 and 2001, respectively. The proportion obtaining honours is 81%, 84% and 89% in those years respectively. There has been no failure in the last three years. The course director believes that the grades reflect a high proportion of students attaining a mastery of the content, i.e., better than most practicing physicians. The course director believes that the evaluations are very representative of what students need to know. Data on internal consistency of the multiple choice question examinations is available but was not presented. Short answer examination questions are marked by one individual and therefore assessments of reliability are not pertinent. There is a presumed link between the objectives and the course evaluations although this is not formally mapped. There is an opportunity to assess the predictive validity of these examinations by correlating marks in this course with those in Foundations of Medical Practice. Feedback is limited to providing students with examination answers within a few days of writing an examination and a final mark shortly after the course has been completed. There is no formal interim review of progress. Remediation for students scoring between 60% and 70% is not mandatory but they are invited to meet with the Course Director. Students who score less than 60% are required to meet with the Course Director. Students who are invited to meet with the director do show minimal improvement during the course but many continue to experience difficulties. According to the Course Director, informally, these students often have had difficulty in other pre-clerkship courses. The nature of the remediation is advice regarding studying and learning and sometimes they are re-evaluated with a take home assignment or oral examination. Areas of Strength The students were very happy with this course. The evaluations are felt to be fair with a moderate level of difficulty. The students feel the exams were reflective of the material taught. Areas for Improvement Multiple Choice Question Examinations: Internal consistency should be calculated for the MCQ items in the examinations. Further data is needed. High proportion of Honours: The high proportion of honours may reflect a high proportion of students mastering content but it may also mean that the questions are too homogeneous and do not discriminate amongst weak, average and outstanding students. Looking at the distribution of marks would be very helpful. Clearly, it is difficult to discriminate weak from minimally competent students when all the marks are high. There is difficulty in distinguishing between assessment of knowledge and applied knowledge. There are many presentations done during the PBL sessions which have gone unevaluated.
56
Recommendations 1. A supplemental report is requested and should provide the committee with more data and an accompanying analysis. A psychometrician (K. MacRury) should be consulted. The supplemental report should consist of a histogram of the mark distributions. Descriptive statistics and measures of internal consistency should be provided for each of written examinations. With your agreement the necessary data can be collected and analyzed on your behalf. It is recommended that the data included in your original report be supplemented with data from the subsequent two years. 2. Consideration should be made to increasing the range of difficulty of the examination questions to better discriminate the weak from the minimally competent student. 3. All methods of evaluation should be considered for the PBL portion of the course. The course director should consider formally evaluating some of the presentations which could be evaluated in written or oral format using a template such as the one used in DOCH. Furthermore, part of the presentations could be linked to CanMEDS 2005 roles. Conclusion of Review Based on the CRICES report presented and the opinion of ESAC committee members, it is our conclusion that: The final review of this course will be issued pending the supplementary report. Respectfully submitted ____________________________ R. Gupta, MD, FRCPC, Med ____________________________ R. Pittini, MD, Med, FRSCS (Chair)
57
Faculty of Medicine University of Toronto
Undergraduate Medical Education
Examination and Student Assessment Committee (ESAC) Review of Psychiatry Clerkship - PSS 330
Reviewers: Dr. Richard Pittini & Ms. Michelle Porepa
Preamble
Dr. Lofchy with the assistance of Tina Martimianakis presented her report to the ESAC committee on April 6, 2004.
A thirty-two page completed CRICES form was distributed and reviewed by committee members prior to the
presentation. Two student representatives from the course as well as student members on the ESAC committee were
in attendance for the presentation and were interviewed separately following the departure of Dr. Lofchy and Ms.
Martimianakis. Dr. Lofchy has been the course director since 2000-2001. Data for the last four academic years was
reviewed.
Course Background
The psychiatry clerkship rotation consists of a six-week block in the third year of the medical curriculum. Five
university-affiliated hospitals are involved in the teaching and evaluation of students. Some students are assigned to
two three-week blocks divided between two sites. Students are given a choice of which teaching site they would
prefer to attend.
Students receive a large amount of direct supervision during this clerkship rotation. There are numerous observed
interviews. Given the nature of the rotation, physical examination and technical skills are not evaluated. Summative
evaluation occurs in the final week of the rotation with the exception of the first case write-up, which is marked at the
midpoint.
The evaluation system consists of four components -- a ward assessment (40%), a written Examination (20%), an
OSCE (25%), and two case write-ups (15%). Students are required to pass each component of the evaluation system,
however final decisions regarding whether a student passes the course are at the discretion of the course director.
The class average has been stable at approximately 81% since the current director was appointed. The proportion of
honours rose from 60% to 75% over the last four years reviewed. These trends are consistent for all components of
the student evaluations utilized in this course.
Ward Assessment
The ward assessment is completed at the end of the clinical rotation. Ward evaluations are completed either by
individual preceptors or by consensus. Students who are assigned to two-sites are evaluated at the end of the six-week
rotation by their primary supervisor. There is some variability in how these evaluations are completed, as there is no
clear protocol outlining how they are to be completed. The ward evaluations account for 40% of the total mark and
average approximately 85% with almost ninety percent of students receiving honours. There is some variation in the
58
ward marks across academic sites but this appears to be diminishing. There has been no formal evaluation of the
variance due to teaching site.
While the ward evaluations correlate best with the case write-ups as might be expected, this correlation remains
relatively weak at 0.40. Both the case write-up and the ward evaluation should reflect the students‟ capabilities on the
ward without the influence of the time constraints that are imposed in an OSCE evaluation.
The ward evaluations that have been presented here were completed using course specific checklists. These forms
have recently been changed to reflect the new institutional objectives. The impact of this change on the ward
evaluations assigned by supervisors will need to be monitored.
Written Examinations
The written examination component of the course is worth 20% of the grade and it is very similar to an OSCE but
with an emphasis on assessment of knowledge. Students tend to do less well on this component with a class average
of just under 80% and only 50% of students receiving honours. There has been a relatively large amount of variation
in scores within a given year depending on the rotation (> 2SD in the 2000-2001 academic year). While this could be
due to the random assignment of students with different capabilities it might also reflect variations in the difficult of
the examination. The alpha coefficients have also been sub-optimal, even negative for some years. This trend seems
to be diminishing. The explanation for this offered by the course director seems to be very reasonable in that it was
associated with several changes to the curriculum including the integration of new objectives. There was an increase
in the number of new examination questions above the usual 20%, which may have contributed to some of the
variability seen.
The data presented regarding examination reliability was of interest despite the lack of quantity. The exam appeals
process appears to be an appropriate means of dealing with contested examination results but it also introduces a
mechanism for ensuring all examination are reliable.
OSCE
The OSCE accounts for 25% of the student grade and consists of five stations, each of fifteen minutes duration. Each
station‟s evaluation consists of a content score, a process score and a global assessment. Raw scores are converted
using an elaborate translation scheme using a „borderline groups method‟ for determining the appropriate cut-point for
a pass. This mechanism allows for adjustment of scores according to the difficulty of the examination stations used.
Both students and the course director note some stations to be quite difficult but the effect that this has on test scores
is adjusted for during the mark translation. Stations are developed by a committee and involve standardized patients.
Care is taken to field test new OSCE stations before they are implemented. Students must obtain an overall score of
60% to pass and must score borderline or better on at least three of the five stations.
The OSCE marks are the lowest marks consistently among the various components. The marks tend to improve
throughout the academic year suggesting that there is benefit to having the rotation later in the year but this has not
been formally evaluated. The alpha coefficients for these examinations were similar to the written examinations. The
explanation for which is that there were many curriculum changes and that many new questions were introduced to
address the recently adopted objectives. There was variation in scores across academies despite the centralization of
the examination. The magnitude of this variation may or may not be significant but it does raise concerns over the
uniformity of experience students are receiving.
OSCE scores tend to correlate poorly with the case write-up (0.17). This is unexpected as both evaluations are aimed
at assessing the students‟ ability to conduct a focused interview. Two distinguishing features of the OSCE are that
they are timed interactions and that the „patients‟ are trained role players rather than actual patients. The OSCE‟s are
standardized and as such are given more weight in the student overall mark.
59
Case Write-ups (CPP)
Students are required to submit two case write-ups. One report is a preliminary report and the second is a final or
progress report. The case write-ups are worth 15% of the final grade and are evaluated according to a template.
Students have access to the template in order to guide them. Student scores tend to be high with very little variation
(5%) raising concerns that students are receiving a grade for having completed the task rather than having the quality
of their submission scrutinized. While the evaluation forms have been assembled with great care, assessing nine to
ten competencies using behaviour anchored rating scales, it is not certain whether those evaluating the students are
properly utilizing these guidelines. Some sites have assigned honours to all but 8 of 140 students over the last four
academic years.
The case write-ups can be marked either in a blinded fashion or by the primary supervisor who is also responsible for
the ward evaluation. This may partially account for the relatively high correlation between CPP and ward evaluations
(.40).
Feedback Mechanisms
Three of the four evaluation components occur at the conclusion of the rotation and as such afford only summative
feedback. This is provided to students in written format and is valued by students as it is based on direct observation.
While there is an opportunity to receive formative feedback at the midpoint this is not a formal component of the
course. Students who are assigned to two sites for their six week rotation do not receive an evaluation from the first
site prior to their move to the second site. Student do receive a formal midpoint evaluation as the CPP Part 1 Case
Report Summary Evaluation.
Students rate their own evaluation as appropriate consistently over the last six academic years giving an average
rating of 3.78/5. They feel even stronger that they receive timely and helpful feedback given the course an average
rating of 3.97/5.
The course CRICES report outlines several examples of student comments and the appropriate actions taken to
address these concerns over the last several years.
Areas of Strength
1. The variety of evaluation modalities utilized and the appropriate matching of evaluation method to that, which
was being assessed.
2. The amount of direct observation of students
3. The sophisticated post-hoc adjustment of examination scores to compensate for variations in station difficulty
4. The use of a committee to develop, test and review examinations
5. Instruction for evaluators and provision of a template regarding the CPP
6. The provision of timely written feedback following written/OSCE stations
7. Appropriate examination quality monitoring (e.g. alpha coefficients) with thoughtful interpretation of results
and reasonable explanations for deficits
Areas for improvement
1. There is a lack of formal evaluation of the effect of two-site versus single site and the impact of academic site
on student evaluations.
Consider analysis of variance to determine how much of the variance in student scores is attributable to academic
site. Compare scores between two-site students and single site students to determine if there is a significant effect
of site assignment. Should there be significant differences it will remain to be determined whether this is a result
of variations in teaching or evaluation.
60
2. There is perception of inconsistency in how ward evaluations are completed, especially for students assigned
to two sites
It is important to clarify the way in which final ward evaluations are derived. That is, if a student has two
supervisors, should both the first and second three week evaluations be weighted equally, or should the final mark
reflect the students, ultimate performance at the end of the rotation (thus weighting the second evaluation more
strongly)? Furthermore, it would be important to have uniformity across sites in terms of who has input into final
ward evaluations (nurses, residents, staff supervisors, consensus, weighting of input).
3. While the post-hoc translation of the OSCE marks adjusts for station difficulty this may not address the
impact of difficult questions on subsequent student performance. Students perceive some stations as too
difficult and some of the SP as unrealistic.
Consider evaluating the impact of examination order to determine whether difficult stations impact students‟
subsequent performance. This could be achieved by comparing students who do their written exams prior to the
OSCE with those who do it after. A more elaborate review of the data could examine the impact of OSCE station
order on the overall performance – do students do better on an „easy‟ station if it occurs before or after a
„difficult‟ station? Is there a link between SP realism and station difficulty? Collect data from students on SP
realism and perceived difficulty.
4. Lack of formative feedback at midpoint
Midpoint feedback to discuss progress to date is often overlooked or not completed. Perhaps more formative
feedback re: global performance could be linked to CPP evaluation. (CPP feedback inevitably occurs at this time
due to specific midcourse deadlines.). Ongoing evaluation of ward performance could be achieved through
completion of encounter forms. These forms could be the basis for formative written and verbal feedback at the
midpoint and could be forwarded to the final evaluator especially for students assigned to two-sites.
5. Lack of correlation between CPP and ward evaluations, inflation of CPP marks
Consider implementing a very specific guideline for how both of these components are completed so that they are
completed consistently across all sites regardless of whether it is a two-site or single site assignment.
Re-marking a sample of the CPP assignments in a fashion similar to the student appeals process utilized for
the written examination would allow you to determine if there is a bias effect, second markers from different
sites, primary supervisor versus blinded marker.
Ongoing faculty training to encourage adherence to marking templates for both ward evaluations and CPP
6. Lack of a standardized approach for the application of course director discretion regarding final pass/fail
decisions. This may pose difficulties at the time of the next director changeover and may account for some of
the variation seen at the time of the last changeover (1999-2000).
Consider setting a policy/guideline for how these decisions will be made, conditions where this might apply, the
range of this discretion, a means for quantifying how often it is required and a method for assessing how effective
a mechanism it is.
Recommendations
1. Seek psychometric expertise to complete the above-suggested analyses prior to the next course review.
2. Develop a formal formative feedback session at the midpoint of the rotation.
61
3. Correlate course written exam scores with MCCQE Part 1 psychiatry scores in order to determine the
predictive validity of the written examination.
Conclusion of Review
Based on the CRICES report presented and the opinion of the ESAC committee members, it is our conclusion that:
Continued improvement be encouraged .......................Full review next cycle (approx. 3 years)
___________________________
Richard Pittini, ESAC Chair
___________________________ ____________________________
Jodi Lofchy, Course Director Date
62
Faculty of Medicine University of Toronto
Undergraduate Medical Education
Examination and Student Assessment Committee (ESAC) Course Review: Structure and Function, STF111F
Course Director: Ian Taylor
Reviewers: Raj Gupta, Dominik Podbielski, Nicolae Petrescu
The CRICES report was presented to the committee on December 7, 2006 by Dr. Ian Taylor
Background
Structure and function is a 20-week course occurring in the first five months of the first year. There are a
total of 636 hours in the course. The percentage of course hours spent in lecture is 32%, 20% in the
laboratory, 9% in seminars and tutorials and 37% is allotted for study time. The objective of the course is to
provide a clinically relevant foundation in basic sciences.
Examinations Overview
There are a total of seven examinations in the course. Examinations 1 and 2 occur on the same day.
Examination 1 is a short answer, performance-based examination, in histology. The images are computer-
based. Examination 2 is a performance-based examination in anatomy, radiology and embryology. There
are 180 markable items including specimens, x-rays and clinical scenarios. Examination 1 is worth 4% of
the final mark and Examination 2 is worth 20% of the final mark. Examination 3 is worth 10% and occurs
about two weeks after Examinations 1 and 2. This is a written MCQ examination in embryology. There is a
take home component of this examination, worth 1.2% of the final grade, which is an essay on an ethical
issue. Examination 4 is a performance-based examination in anatomy, radiology and embryology, and takes
place approximately two weeks after the last examination. It includes 180 markable items including
specimens, x-rays and clinical scenarios. In mid-December, students undergo Examination 5, which is
“Integrated Exam 1” based on weeks 11-16 (examination material is not accumulative). It is worth 23% of
the final mark and is a 70 item MCQ examination. Examination 6 occurs in the New Year, and it is a
performance-based examination in histology with computer-based images, and is worth 5%. One day later,
Integrated Examination 2 (Examination 7) takes place. It is comprised of short answer questions worth 30
marks and 40 MCQs.
Examination Development
Examinations 1 and 7 are devised by the principle lecturer in histology. Exams 2 and 4 (gross anatomy,
radiology and embryology) are devised by a group of 9 people and are reviewed by the Course Director.
For Examination 3, there are two MCQ questions developed out of each of the 22 lectures. The questions
63
are created equally by the two lecturers in embryology. The ethics essay scenario is created by the Ethics
Coordinator. Examinations 5 and 6 are devised by a variety of lecturers in the various disciplines that make
up Section B of Structure and Function.
Three years worth of Exams 2, 3 and 4 are given to the students at the start of the course. For Section B,
the students received only the most recent versions of the exams due to a recent change in teaching
personnel. Questions may be repeated over the years. Each examination was reviewed by at least one
individual, except for Examinations 5 and 7.
Performance Standards
Students require a minimum overall average of 60% to pass the course. Students must pass both Section A
and Section B to pass the course. Students were also assessed on professional behaviour but it is unclear if
and how lapses in professionalism are documented or managed. Students receiving a mark of less than 60%
on any examination are invited for an interview with the Course Director. Some students who receive a
mark between 60% and 70% are also invited for an interview. Students often self-refer when their
performance is weak. Weak students are also discussed at monthly pre-clerkship meetings. Students are
given remediation that is individually tailored. Remediation success is determined objectively via written or
oral examinations given by one or more examiners
Examination Statistics
Overall
Over the last three years, the class average has remained between 80% and 82%. The range is 62% to 94%.
Last year, 59% received honours, a decrease from 2003/2004 when 70% of the class received honours. No
students have officially failed the course, but some have taken leave for personal reasons and others have
remediated prior to Board of Examiners meetings.
Examination 1:
Over the last two years the class average has increased from 73% to 79%. The proportion failing has
dropped from 10% to 5%, the proportion receiving honours has increased from 31% to 56%.
Examination 2:
The class average has decreased over the last three years from 81% to 75%. The proportion failing has
increased from 2% to 6% and the proportion receiving honours has dropped from 58% to 30%. The Course
Director feels that students are performing less strongly because of weaker student performance as opposed
to more difficult examinations. Students of the Biomedical Communications degree program form a control
group with which the medical students can be compared, and the course director reports that these students
have achieved more stable grades over the last three years.
64
Examination 3:
The class average has remained stable over the last three years, and it ranges from 82% to 85%. The
proportion failing is between 0% and 1.5%. The proportion receiving honours is down somewhat this last
year, at 63%, from 76% in 2003/2004.
Examination 4:
The class average in 2005/2006 is 83% which is a marked increase from 75% received in Examination 2.
The proportion failing has dropped dramatically to 0% in the last academic year. There is marked
improvement in performance in gross anatomy compared with Examination 2. This improvement is thought
to be due to improved performance amongst students who do poorly in Examination 2.
Examination 5:
The class average has dropped from 79% in 2004/2005 to 74% 2005/2006. The proportion of student
honours has also dropped from 50% to 32%. The proportion failing is now at 8% compared with 2.5% in
the year prior. The Course Director feels that this is due to weak performance particularly in biochemistry.
The Course Director notes that subsequently increasing the proportion of class time in biochemistry has led
to a better performance in the integrated examination in January.
Examination 6:
In the second integrated examination, the class average is 83%, up from 74% in the first integrated
examination six weeks prior. The proportion failing is only 0.5% this past academic year compared with
4.1% in 2003/2004.
Examination 7:
The examination statistics are relatively stable. The class average ranged from 81% to 84% in the last three
years with the proportion failing ranging from 2.5% to 5%.
Reliability Of The Examinations
Unfortunately, there is no information regarding inter-reliability, however, one person marks a particular
portion of an examination. Cronbach‟s Alpha for Examinations 3, 5 and 7 (MCQ‟s) range from 0.64 to 0.79.
Validity Of The Examinations
Gross anatomy examination testing time is directly proportional to the time spent in class in each area.
Further evidence presented informally is that students who do well overall, have done well in all sections of
the course and vice versa. The Course Director believes that students who have no biochemistry or
histology in their background do have difficulties with these parts of the course. Students who do poorly in
structure and function also do poorly in other courses. The majority of failures occur in a small cohort of
students
65
Feedback
Examinations are returned to the students within one working week of the examination. The final
examination in January is not returned, but students have the opportunity to review their paper with the
principal lecturer. The ethics essay is returned to students. It includes comments and the template for
marking. The Ethics Coordinator interviews every student whose essay is below the accepted standard. The
marking seen for the MCQ examinations are posted so that students can compare their performance with
that of the template. There is no formal mid-term feedback but students are aware that they may go to their
tutors, principle lecturers or course directors for further discussions of their evaluations.
In the past, the Course Director has made many changes to the course based on student feedback. In
particular, the Course Director has remedied the deficits identified in the May 2000 ESAC review.
Areas Of Strength
Students are generally very satisfied with the evaluations used in the course. The evaluations are felt to be
fair with a moderate level of difficulty. Examination results are generally stable over the years, and where
they are not, the course director has explained the phenomenon and made alterations to the course to
improve areas of weak performance. MCQ examination statistics are good. Feedback on MCQ
performance by posting the marking template is a strength.
Areas For Improvement & Recommendations
Consideration should be given to increasing the weight of the ethics examination to reflect the time spent on
the assignment. Consideration should also be given to altering the schedule of the evaluations to more
evenly distribute the amount of material per examination.
Conclusion
Based on the CRICES report presented and the opinion of the ESAC committee members, it is our
conclusion that:
Continued improvement be encouraged ............... Full review next cycle (approx. 3 years)
___________________________
Richard Pittini, ESAC Chair
___________________________
Date
66
October 28, 2002
Examination and Student Assessment Committee
Phase ll Surgery Clerkship
Reviewer: Anita Rachlis
The Phase ll Surgery Clerkship CRICES Report was presented to ESAC on June 11, 2002 by the Course
Director, Dr. Ted Ross and Education Consultant, Dr. Stan Hamstra. The course is a 6-week block taken as
three 2-week blocks chosen by the student from the specialties of general surgery, orthopedic surgery,
neurosurgery, urology, plastic surgery, cardiovascular surgery, vascular surgery, thoracic surgery, pediatric
surgery and transplantation. The educational experience includes both central and hospital based seminars
(two hours per week). The central seminar program consists of 4 key topics: trauma, cardiovascular surgery,
neurosurgery and pediatric surgery. The clinical clerk admits and follows patients and attends appropriate
operating room and ambulatory clinics of his/her assigned staff surgeon for each two-week rotation. At the
time of this CRICES report the evaluation consisted of 3 components: ward assessment, written MCQ
examination and OSCE at the end of the combined Medicine/Surgery 12-week rotation (currently at the end
of each 6-week rotation). A passing grade in the ward assessment is required before the student is permitted
to sit the written examination. The grade is determined as follows: a combination of the surgical
components of the written (MCQ) and the OSCE will constitute the „factual average‟. Provided the „factual
average‟ is a passing grade, the final surgery grade is the average of the ward (1/3), written (1/3) and OSCE
(1/3) grade. If the „factual average‟ is not a passing grade, it stands as the final grade.
The evaluation of the students is based on several different methods: a descriptive clinical performance
(ward assessment) using the forms provided for each of the clerkships but with descriptors specific for
surgery, and two objective examinations, an MCQ examination and OSCE. Students receive feedback at the
end of each 2-week block and informal mid-rotation feedback.
Data is provided for the academic years 1998-2001. The overall class average and proportion of honors has
not changed significantly over that time period. It appears that over the past three years only one student has
failed.
Methods of evaluation and Observations from the CRICES report:
1. Ward evaluation
The ward grade is based on an average of the three separate ward assessments provided by the three
different 2-week subspecialty rotations. At least two assessments are used to calculate the grade. A template
was used to derive the mark based on the criteria checked by the supervisors such that the supervisors did
not assign a specific grade in 2000-2001.
The class average has been in the honors range with a high percentage of students achieving honors in 1998-
9 and 1999-2000. This may have been due to supervisors assigning specific grades in those years. In
contrast an algorithmic approach was used in 2000/2001 perhaps accounting for the lower class average and
lower percentage achieving honors in 2000-2001.
67
An analysis of internal consistency of the ward assessments over three rotations in 2000-2001 suggested
high correlations but it was not clear as to what was actually being measured in this analysis. The measure
of internal consistency instead could include a correlation of the individual student scores in each of the 2-
week blocks. Another analysis should look at consistency of faculty grading of students and in relation to
other faculty including within the same subspecialty. There were no statistical differences in student grades
among academies for 1999-2000 and 2000-2001. Hospital specific data were not provided nor were there
analyses by subspecialty.
With respect to the validity of this evaluation: content validity is supported by the fact that the evaluation is
based on the course objectives. Correlations of the ward evaluation to the written examination and OSCE
are low indicating either that there is in fact a low correlation between the assessments or that the
assessments are measuring different things. One analysis of construct validity that could be done would be
to assess performance at different times during the year, such that students doing the rotation early in the
year might perform less well than those doing the rotation in subsequent time periods. Predictive validity is
not included but an analysis of performance related to Phase 1 results and on the Medical Council
examinations could be attempted.
Feedback is given at the end of each block but it is not clear how mid-block feedback is provided and
documented given the short rotations of only 2 weeks.
2. Written examination
The written examination is given as an MCQ examination of 60 questions. A multidisciplinary surgical
committee created the question bank. 15% of the questions are new each year. Item statistics are used to
decide upon modification and deletion of questions.
There has been an increase in overall class average during 1999/2000 and 2000/2001, with a concomitant
increase in the proportion with honors. Reliability of the examination is variable with alpha coefficients
ranging from .11 to .60 in the academic year 2000-2001. There was no statistical difference among
academies in the last two academic years. Content validity is supported by the fact that the evaluation is
based on the course objectives in the seminar syllabus and although the exam emphasizes Phase ll
curriculum, Phase 1 content is incorporated as well. As with the ward assessment concurrent validity is low
between the written examination and the ward evaluation or OSCE. This again may be a function of the fact
that each evaluation is assessing different parameters. A measure of construct validity was not provided but
again could be an analysis of performance over the academic year. A second approach would be to examine
the scores of students who did specific subspecialty rotations and the performance on subspecialty-specific
examination questions, though this may be limited by the small number of such specific questions included
in the examination. Predictive validity was not provided but correlations of performance in Phase 1 surgery
to Phase ll surgery could attempt to assess this as could performance in the Medical Council examinations.
Students do not receive specific feedback from the written examination other than the score as this is a
secure examination and students do not review the answers to the specific questions.
3. OSCE
The OSCE has been a combined Medicine/Surgery assessment at the end of the 12 weeks of the Medicine
and Surgery rotations. The Surgery component includes eight stations, four 10-minute and four 5/5-minute
stations. The stations are drawn from a bank that is currently being updated. Consultation with the
68
Medicine clerkship OSCE coordinator occurs so that station content does not overlap. Examiner feedback
on the stations is used for future station development.
OSCE grades have been consistent over the three academic years reported upon with a low percentage of
students obtaining an honors score. Reliability of the examination is acceptable with an alpha coefficient of
.51 for the February 14, 2001 examination. There was no statistical difference in examination marks across
academies in the last two academic years. Content validity is again supported by the fact that the
examination is based on course objectives and incorporates both Phase 1 and Phase ll content. Correlations
between the OSCE and the other evaluations are again low suggesting that the evaluations may be assessing
different aspects of student learning. Analysis of construct validity was not provided but could include
performance based on time of rotation during the academic year. A second approach would be to examine
whether performance was better when the OSCE was done just following the Surgery rotation versus the
scores of students who do the OSCE six weeks later after the rotation because of the Medicine clerkship
intervening. Predictive validity was not provided but could include comparison of performance in the Phase
2 Surgery OSCE with either their scores in Phase 1 Surgery or the Medical Council examination.
Feedback on the OSCE is provided via a form that lists the station content, performance on a scale from 1 to
5 on history taking, physical examination, organization, knowledge and communication for each station and
on content (checklist score) as being below the passing standard or at or above the passing standard for each
station.
Areas of Strength:
1. The evaluation of student performance in the Phase ll Surgery Clerkship is based on several different
assessments measuring factual knowledge (written MCQ examination), clinical performance (ward
assessment) and clinical skills and knowledge (OSCE).
2. The OSCE examination has provided consistent grades over the past three academic years and a formal
feedback process has been instituted more recently.
3. The Ward assessment has suffered from grade inflation over time but with the recent institution of an
algorithm to calculate the score this has become a lesser concern.
4. The course director has indicated that the test banks both of the MCQ examination and the OSCE are
undergoing revision and renewal.
Areas for Improvement:
1. Although the ward assessment utilizes an algorithm to calculate the score this may be based on only 2
out of 3 assessments. The number of students for whom the final ward assessment is based on only two-
thirds of the evaluations was not provided. This would be of interest. This practice should clearly be
minimized. A correlation of the scores for individual students obtained on each of the rotations would also
be of interest. The current form does not provide descriptors for each of the cells: these should be
considered in future revisions.
2. Internal consistency of the written (MCQ) examination appears to be less than ideal. It has been
suggested that the examination is less secure than expected. A departmental examination committee to
review, revise and renew the current bank of questions could help to improve the reliability of the
examination and monitor the statistical properties of the examination.
69
3. Currently there is no formal feedback to the students on their performance on the written (MCQ)
examination other than the final score.
4. An analysis of construct and predictive validity of each of the component assessments may be helpful to
ensure that the assessments are valid and reflect student performance.
Recommendations:
1. Ward assessment
a) The ward assessment should ensure that all three evaluations (each 2-week block) are included in the
calculation of the final score.
b) The ward assessment form should be reviewed to include descriptors in each of the categories and these
should sufficiently distinct to discriminate across the scale.
c) An analysis should be carried out to determine the correlations across each of the evaluations to
determine consistency, including consistency across the sub-specialties and preceptors.
2. Written examination
a) A departmental examination committee be established to generate new examination questions, review and
blueprint each examination and monitor statistical properties of each examination and over the academic
year. The department should consider paying question authors a stipend if current difficulties in obtaining
sufficient new questions persist.
b) A process be developed to provide students with feedback on their performance on the examination such
as sub-specialty performance and provided as H/P/F and in relationship to other students.
3. OSCE
a) A departmental examination committee should be established to generate new stations, review and
blueprint each examination and monitor the statistical properties of each examination and during the
academic year.
4. Analysis of validity
a) An analysis should be performed to provide evidence of validity particularly construct and predictive
validity of each of the components of the assessment. Correlations with Phase 1 grades and with
performance on the Medical Council examination can be calculated to complete this component of the
CRICES report.
Conclusion of Review
Based on the CRICES report presented and the opinion of the ESAC committee members, it is our
conclusion that:
Multiple revisions as per committee suggestions . review in 1-2 years
Richard Pittini, ESAC Chair
Ted Ross, Course Director