Reflective Essay 2_Mariana Campos

Mariana CamposReflective Essay #2March 28, 2012

Fine Tuning the Assessment Policy at the Lab

While the five principles of “practicality, reliability, validity, authenticity and washback” endorsed by Brown and Abeywickrama (2010, p. 25). have been taken into account in developing the assessment policy of the Laboratorio de Idiomas —the continuous education university program where I work— there may still be room to fine tune some details and achieve more consistent results.

Authenticity and “real-world tasks” (Brown and Abeywickrama, 2010, p. 37) rank high among assessment priorities. Oral performance is assessed through such activities as role play and presentation, which resemble “the features of a target language task” (Brown and Abeywickrama, 2010, p. 36). In the latter task, students may choose a topic they enjoy or research some other subject of their interest where they can apply what has been studied. The role play, generally suggested by the teacher, stems from the curriculum; thus contributing to content validity. “Construct-related validity” (Brown and Abeywickrama, 2010, p. 33) is also taken into account since feedback is planned on account of students’ oral performance in the communicative situation by recording grammar and vocabulary usage, fluency and pronunciation. Authenticity is resumed in the reading comprehension text chosen for the end-of-term written test. From the earliest levels, students are expected to deal with paratextual features and transparencies to help them read, although in later stages they are made aware of false cognates, as well. Hence, the chosen article is usually accompanied by the source, title, a picture and its caption. Finally, the written production is tested directly through an open response writing related to the reading passage, from which students may take ideas but not quote full phrases.

The role of washback is paramount. The final test is considerably more challenging and integrating in nature than most of the activities presented in the book and workbook. Therefore, there is a special booklet with suggested activities and as a teacher I usually adapt extra material into cloze tests and provide supplementary practice of grammar points, such as question making, in the form of a “story line” and not just “discrete-point” (Brown and Abeywickrama, 2010, p. 46) activities. Most activities in the written production section of the final exam assess grammar and vocabulary requesting the student to complete an interview or a dialogue, or to fill in a cloze test. There is a more guided writing section where students show their command of vocabulary learnt through the term. And finally, there is some vocabulary recognition activity, such as the “odd one out”. Due to the lack of a listening component in the test in two out of three levels, a washback effect I have to be on the lookout for is that students may not be interested in practicing listening comprehension skills. Once during a review session, a student told me he would not do the listening activity since it did not prepare him for the final exam of the following week. Needless to say, I had included a listening comprehension in the revision because students are supposed to build up their listening skills but these are only tested every three levels.

Validity is also taken into account. The final tests are prepared by the coordinators according to the material used in class. There was one time though, when students were given a test corresponding to the earlier textbook that had been used for the level, which was considerably more difficult than what students had achieved. So the test results were somewhat discouraging, and I had to overlook certain mistakes. A new version in line with the adopted literature was designed the following semester. In terms of validity, the test provides me with “appropriate, meaningful inferences” (Brown and Abeywickrama, 2010, p. 29) since the results

Mariana CamposReflective Essay #2March 28, 2012usually coincide with my expectations according to the assessment I have made of students’ performance in class. Nonetheless, marks in general tend to be rather low for most students.

Reliability has been considered but could be improved. The test is divided into sections and the instructions and scoring are clearly stated. There is a key for the written test, writings are throughout the course corrected according to the same rubric (task, organization, vocabulary, grammar), and oral assessments are given an overall mark based on the components mentioned earlier. In spite of all these unifying criteria, I believe inter-rater reliability (Brown and Abeywickrama, 2010, p. 41) is not guaranteed. For one, I am always tempted to give half marks for a good answer with spelling mistakes, for instance, and I am not sure of my colleagues’ behavior in this respect. And the marking of oral tasks is significantly subjective. In the realm of “test administration reliability” (Brown and Abeywickrama, 2010, p. 28), the poor condition of the walls and ceiling in some classrooms and the use of tablet arm chairs would not qualify for standardized tests. However, facilities deficiencies are in plain view all year long and students do not deem them a hurdle.

Practicality does not seem to be a top priority, yet it is not entirely absent. Most of the test is easily gradable, except for the writing piece which takes up more time an effort to mark. Nevertheless, solving it is quite taxing for students, since it usually takes 2 hours and only the quickest may do it in the suggested time of an hour and a half. There usually are complaints regarding how long it is to solve or how the vocabulary in the reading section is to be guessed rather than inferred; in fact, for weaker students, the test is rather overwhelming.

In order to improve the assessment policy, I would add listening tasks to each instance of the Ongoing Oral Assessment. This would allow students to know their bearings regarding the skill, yet it would not be as intimidating as a single instance in the final test. Still it would be more stimulating for students since they would not speculate that two terms will go by without any assessment of their listening skills. Another area for improvement would be inter-rater reliability. Refresher courses on how teachers are expected to score the oral assessment and the tests would be highly beneficial. Finally, instead of relying only on the coordinators to write the tests, teachers cooperation could be sought to ensure criterion related validity.

Last but not least, reading about “intra-rater reliability” (Brown and Abeywickrama, 2010, p. 28) I have thought of the times that on account of time constraints or plain laziness I have not reviewed every corrected test when I had changed a criterion. As regards the writing rubric, I could more carefully explain to students what each item entails for them to make more sense of these labels. They would also profit from more detailed feedback at the back of the term.

Altogether, the assessment policy developed for the Laboratorio de Idiomas proves to be highly authentic and valid with positive washback, but a few tweaks would make it more reliable, and valid in the eyes of students.

References

Brown, H. D. & Abeywickrama, P. (2010). Language Assessment: Principles and Classroom Practices. White Plains, NY: Pearson Education, Inc.

Documents

Reflective Essay 2_Mariana Campos