A classroom-based assessment method to test speaking skills in English for Specific Purposes

Preview:

Citation preview

CercleS 2014; 4(1): 9 – 26

María Pilar Alberola ColomarA classroom-based assessment method to test speaking skills in English for Specific Purposes

Abstract: This article presents and analyses a classroom-based assessment method to test students’ speaking skills in a variety of professional settings in tourism. The assessment system has been implemented in the Communication in English for Tourism course, as part of the Tourism Management degree pro-gramme, at Florida Universitaria (affiliated to the University of Valencia). Based on our ESP teaching experience, we have noticed a need to design an assessment procedure that would enable us gather several samples of students’ speaking competencies throughout the course. An extended process that involved research and meetings with colleagues and students led to the development of the assess-ment method in question, which can be described as a multimodal communica-tive approach to testing, organized as continuous assessment, and fully related to the course syllabus. To enhance validity and reliability, we have opted to create an assessment procedure based on a combination of testing formats, rating crite-ria and rating scales. The procedure involves two testers and the overall grade is based on the results obtained in seven tests. The article reviews contributions to the assessment of speaking abilities, presents our testing procedure, describes its implementation, discusses its advantages and disadvantages, and concludes by analysing its washback effect and pedagogical implications.

Keywords: classroom-based assessment, communicative testing, speaking skills, ESP, tourism studies

DOI 10.1515/cercles-2014-0002

1 IntroductionThe present situation of international companies means that a good command of professional English is essential to succeed in the tourism industry. Members

María Pilar Alberola Colomar: Florida Universitaria (affiliated to the University of Valencia), Faculty of Tourism, Languages Department. E-mail: palberol@florida-uni.es

Brought to you by | New York University Elmer Holmes Bobst LibraryAuthenticated

Download Date | 10/9/14 2:46 AM

10   María Pilar Alberola Colomar

of staff are not only required to communicate in English but also to perform in this language according to the appropriate standards set by companies. As ESP lecturers on the Tourism Management degree programme, we are aware of the relevance our courses will have in students’ careers. Tourism graduates obviously need to be proficient in both writing and speaking skills. Two main reasons, how-ever, lead to the major role that speaking activities are given in our programme. On the one hand, oral communication in English is usually the weakest point students have when they start at university. On the other hand, speaking encoun-ters are likely to be the most frequent type of situation students face during their careers in tourism. It has been long recognized (for language education, see e.g. Madsen 1983) that testing is an important part of the teaching and learning experience. Since teaching and testing are so closely interrelated, we have noticed a need to design an assessment procedure that would enable us to gather several samples of students’ speaking competencies throughout the course and to use the proce-dure’s ‘washback’ effect as a source of information for making decisions in our teaching-learning process. The specific goals to be reached with our assessment proposal are the following: 1. To minimize the influence of factors such as the method effect or students’

anxiety about results.2. To score by systematizing impressions and thus addressing the question of

unreliability and inaccuracy of judgment. 3. To check the development of students’ speaking skills in professional set-

tings over the academic year.4. To increase students’ motivation through feedback.

This article presents and analyses our classroom-based assessment, which can be described as a multimodal communicative approach to testing, organized as continuous assessment and closely related to the course syllabus. What makes this approach different from the testing methods we used before is that the over-all grade does not depend on a final test but is obtained cumulatively. Further-more, this system is based on a combination of tasks, rating criteria and scales, which provide teachers with an overall view of students’ speaking ability. Finally, the test tasks chosen enhance students’ professional skills. In what follows we present the different approaches to testing speaking abil-ities that have influenced our work, describe our assessment procedure, detail its advantages and drawbacks, and analyse the washback effect on teaching and learning.

Brought to you by | New York University Elmer Holmes Bobst LibraryAuthenticated

Download Date | 10/9/14 2:46 AM

A classroom-based assessment of ESP skills   11

2 Testing speaking skillsIn order to create the assessment method described, we wished to apply a system-atic procedure. Harris and McCann (1994) provide a six-step procedure which we found useful: thinking about the syllabus objectives, working out a programme, thinking about test content, when testing is going to take place, how to test, and which test format to use according to the purpose. Additionally, the number of tests that can guarantee an adequate range of speaking samples needs to be de-cided. Bachman (1990) suggests that the amount and type of testing that is done depends on the decisions that need to be made. A further aspect to consider when testing speaking skills in tourism, is whether the tests should be based on general English or English for Specific Purposes (ESP). There is no consensus on this as these two points of view exem-plify: Douglas (2001) defends ESP tests and states that the content, test method and some rating criteria should derive from the target situations, whereas Davies (2001) argues that ESP testing has not proved to be more valid than a general pro-ficiency test, although for pragmatic reasons it is still worth working on specific testing. Teachers devote a lot of time and effort to creating good tests, but there is no such thing as the perfect test. Hughes (1989: 6) describes the ideal test as one “which will consistently provide accurate measures of precisely the abilities in which we are interested; have a beneficial effect on teaching (in those cases where the tests are likely to influence teaching); be economical in terms of time and money”. In order to create useful assessments Bachman (2002) suggests the integration of both task-based and construct-based approaches to test design. In an attempt to list the essential features of a good test, researchers coincide in the need for a test to be valid and reliable. Apart from these two qualities, scholars add other features that a test must have. According to Weir (1990) and Bachman and Palmer (1996) a good test should also be efficient, practical, accountable, authentic, interactive and with impact. Carroll (1991), however, proposes that we should judge whether a test is good or bad by checking it against four criteria: relevance, comparability, acceptability and economy. If we focus now on assessing speaking skills, the issue has proved to be extremely difficult; in fact Heaton, one of the early experts in language testing, finds it too complex to achieve reliable objective testing (Heaton 1988). He lists some other difficulties such as: the criteria for measuring the speaking skills, the weighting given to some components, and the interdependence of listening and speaking that increases the difficulty to analyse what is being tested at a time. A further drawback was considered to be that the examiner of an oral production test is working under great pressure making subjective judgements as quickly as

Brought to you by | New York University Elmer Holmes Bobst LibraryAuthenticated

Download Date | 10/9/14 2:46 AM

12   María Pilar Alberola Colomar

possible. Finally Heaton states that test administration is an added difficulty if there are large numbers of students to test. More recently Luoma (2004) focuses the challenge of testing speaking on the amount of factors that influence the tester’s impression of how well a person can speak, and on the fact that teachers expect test scores to be accurate, just and appropriate for their purposes. Although there have been important improvements in the testing of speaking skills over the last few decades (for up-to-date overviews, see Fulcher [2003] and Luoma [2004]), O’Sullivan (2008) lists some areas that are still of great concern to the test writer: construct definition, predictability of task response, interlocutor effect, the effect of test-taker characteristics on performance, rating-scale validity and reliability, and tester reliability, as well as a series of practical problems re-lated to logistics, time and expense. In order to address the difficulties outlined above, our assessment procedure involves four concepts: communicative approach to testing, continuous assess-ment, multimodality, and washback effect. As regards the characteristics of a communicative approach, Davies (1988), who typically makes very perceptive analyses of trends in language testing and assessment, claimed that communi-cative language tests had no clear-cut definition. However, the following features tend to be considered characteristic of communicative testing: communicative language tests should aim at measuring how real-life language tasks and activ-ities can be performed. Consequently, as e.g. Porter (1991) and Paltridge (1992) suggest, most testers should incorporate tasks which approximate as closely as possible to those faced by students in real life. Communicative tests should be contextualised, respond to learners’ needs, and be based on language use in contexts and for purposes relevant to the learner. Furthermore, communicative testing has introduced the concept of qualitative modes of assessment in prefer-ence to quantitative ones. Finally, communicative tests should have a high level of content, construct and predictive validity. In relation to continuous assessment, from Heaton (1990) to Wiliam (2011) it has been widely argued that this type of assessment should be considered a com-ponent of the teaching programme, and therefore a means of improving teaching, since it provides opportunities for revision. It is also likely to encourage learning and motivate students. Each test format offers the tester advantages and drawbacks; therefore, in order to choose test tasks, two aspects have to be considered according to Fulcher (2003): whether or not (i) they will elicit a performance that can be scored and (ii) it will be possible to make inferences from the score to the construct that needs to be measured. If we review the formats we have used in our procedure, we can say that interviews and role plays are two useful ways of testing speaking skills because

Brought to you by | New York University Elmer Holmes Bobst LibraryAuthenticated

Download Date | 10/9/14 2:46 AM

A classroom-based assessment of ESP skills   13

listening and speaking are fully integrated as in real life and can be assessed in a relatively natural professional situation. In fact role plays and simulations are more appropriate for specific purposes, as Fulcher (2003) explains. Harris and McCann (1994), who provide a detailed analysis of the pros and cons of differ-ent test formats, think that role plays are excellent for testing interaction, but as a drawback they highlight that in addition to language, role plays can test the ability to act. Since in role plays students are assigned fictitious roles, O’Sullivan (2008) suggests that role familiarity may affect performance. For this reason Fulcher et al. (2011) propose that learners should be given the opportunity to per-form both roles in a speaking test since most encounters are not between two participants with completely equal rights and roles. They conclude, however, that whether or not this is done will depend on the purpose of the test and practical constraints. With regard to group discussions, Heaton (1988) argues that they can show how students are thinking and using the target language. He also thinks that group activities can provide an opportunity for meaningful and active involvement. As for oral presentations, Harris and McCann (1994) point out as an advan-tage that they are realistic and give the tester time to assess performance, and as a disadvantage that there is no interaction and they can have a high stress factor. More recently Busà (2010) and Chou (2011) highlight that oral presentations are influenced by non-linguistic factors, namely individual differences, affective fac-tors, disagreements, difficulty in meeting deadlines, non-verbal messages, and specific socio-cultural conventions. However Chou considers oral presentations ideal for cooperative learning, Finally, the washback effect of classroom-based assessment on instructional and learning practices is taken up. Heaton (1988) supported oral tests as having an excellent washback effect on the teaching that takes place prior the test. Hughes (1989) found that although the accurate measurement of oral ability is not easy, when backwash is an important issue the effort is necessary. Muñoz and Álvarez (2010) have investigated the effect of a classroom-based oral com-petence assessment on instruction, basing their study on aspects like grammar, communicative effectiveness, pronunciation, vocabulary, and task completion. Their conclusions indicate that on the one hand washback can be fostered by informing students of assessment procedures and scoring scales, specifying objectives, and structuring assessment tasks. On the other hand, washback can also be encouraged when teachers and students clearly establish the connection between educational goals and assessment. Another aspect of testing that needs close attention is rating. It is generally accepted that rating scales are essential to ensure valid and reliable scoring. There are different ways to classify rating scales – holistic or analytic, primary

Brought to you by | New York University Elmer Holmes Bobst LibraryAuthenticated

Download Date | 10/9/14 2:46 AM

14   María Pilar Alberola Colomar

or multiple trait, etc. But to test speaking, Fulcher et al. (2011) analyse basically two approaches, the measurement-driven approach in which we find the scales designed for the Common European Framework of Reference for Languages (CEFR; Council of Europe 2001), for example, and the performance data-driven approach. They argue that the measurement-driven approach generates impoverished de-scriptions of communication, whereas performance data-driven approaches result in richer descriptions that give better inferences from score meaning to performance in specific domains. Fulcher et al. (2011) have designed a scoring instrument called Performance Decision Tree (PDT), based on a binary-choice definition scale, which improves performance data-based scales and prioritizes the performance effect. O’Sullivan (2008), however, compares advantages and disadvantages of several rating scales, basically holistic and analytic, concluding that the final decision to use one type of scale often depends on practicality. To create effective rating scales, adequate rating criteria are essential. In relation to testing speaking skills in ESP, Fulcher et al. (2011) consider the abil-ity to produce the basic obligatory elements of the genre to be the first criterion for evaluation. Douglas (2001) suggests taking into account the criteria given by experienced professionals in the field, which can supplement linguistically ori-ented criteria and help examiners to interpret language performances in specific purpose tests. In situations of spoken interaction, many non-linguistic factors are involved that should also be part of the assessment criteria. Harris and McCann (1994) point out that achieving a balance between linguistic performance and non-linguistic factors is difficult, and that assessing non-linguistic factors such as attitude, group work, organisation of work, independence, creativity and pre-sentation, raises issues of reliability and fairness.

3  The assessment method at Florida Universitaria

3.1 Scenario

Our testing procedure has been designed for the three modules of Communica-tion in English for Tourism at Florida Universitaria (affiliated to the University of Valencia, Spain). Students have to complete modules I, II and III, with a total of 180 hours of tuition, during the first three years of the bachelor’s degree pro-gramme in Tourism Management. We have an average of 20 students per class and the course methodology includes oral communication encounters that represent the situations students are likely to face in their future careers. Those tasks are contextualised in pro-

Brought to you by | New York University Elmer Holmes Bobst LibraryAuthenticated

Download Date | 10/9/14 2:46 AM

A classroom-based assessment of ESP skills   15

fessional case studies designed within the problem solving methodology. From a linguistic point of view, the class materials range over the three years from CEFR level B1 to C1. Although most students enter the programme with level B1 in English, we actually work with mixed-ability groups.

3.2 Description of the assessment method

We have created a communicative approach to testing in which a student will succeed not only because his/her message is formally accurate but also because it communicates appropriately in a particular professional setting. The method presented here has been organized as continuous assessment in which students have to take part in seven tests. They are awarded marks in each oral test, which then contribute cumulatively to their overall result. One of the main advantages is that the lecturer can follow students’ development over the academic year. As we planned to design a testing method clearly related to the course syllabus, each of the seven tests is related to one of the topics covered in the course. Since no testing method is perfect, we have opted for a combination of possi-bilities within each of the components of the assessment procedure, e.g. testers, test formats, rating criteria, rating scales and forms, as we explain in more detail in the following sections.

3.2.1 Testers

Two lecturers have been involved in the testing process we have put into prac-tice: the course teacher, who is responsible for the subject and teaches the whole group, and an additional teacher who works for the Florida Language School and teaches extra speaking sessions to groups of five students outside the regular ac-ademic timetable. Non-Spanish teachers are chosen for these additional classes. In view of the teaching–testing relationship, having a second teacher in-volved provides several advantages. Firstly, as speaking and listening are closely interrelated, having a teacher whose mother tongue is not Spanish helps students to understand other accents; secondly, a multicultural component is added to the conversation classes in this way; and thirdly, a second tester’s opinion increases assessment objectivity. As far as functionality is concerned, this requires full col-laboration and coordination.

Brought to you by | New York University Elmer Holmes Bobst LibraryAuthenticated

Download Date | 10/9/14 2:46 AM

16   María Pilar Alberola Colomar

3.2.2 Test formats

A varied combination of tasks gives students more opportunities to show their ability to communicate orally in different professional circumstances. We have used the term “multimodal” to describe a proposal that combines different test formats. On the whole, our students will be assigned interactive tasks that in-volve face-to-face interaction with fellow students (role plays, meetings, group discussions) and production tasks that require them to address audiences (oral presentations, short monologues). The seven tests students have to take are organized as follows: two are done with the course lecturer and the other five with the assistant teacher in the five extra sessions. Both teachers have previously agreed on tasks, scoring, topics, etc. As for the additional conversation classes, the second teacher deals each time with one of the topics that students have been working on in the regular classes. The class starts with a brief review of vocabulary and concepts to contextualise the tasks, and then students work on three types of activities involving group and individual work in which they will be assessed: group discussion, a role play, and a monologue in which each student gives his/her opinion, summarizes the group position on the topic, etc. Students do not know the task specifications before-hand. Consequently, a certain amount of improvisation is needed, which gives a different perspective on the student’s speaking skills. Although giving feedback after testing is a major issue for us, owing to time constraints feedback is not possible immediately after the class. However, students have the opportunity to make an appointment with the teacher in the following days. With the course teacher, students perform more complex tasks that make higher demands of their professional skills. Students know the test specifications beforehand and have time to do some research and prepare their part of the work. The teacher gives feedback immediately after the performance, which is an op-portunity to deal with the mistakes made. The following examples of test tasks performed in the regular classes illus-trate the testing procedure we have implemented. They also show the close inter-relation between teaching and testing, in terms of a number of professional com-petences that are enhanced through students’ preparation of the test activities assigned, such as autonomous learning, team work, information search, speak-ing in public, creativity, ITs, etc. The first-year (MI) students are tasked with acting out a service encounter, a role play based on hiring a car. It is a guided conversation in which students are given instructions, the structure of the dialogue and linguistic input in class. However, autonomous learning and information search are encouraged by asking

Brought to you by | New York University Elmer Holmes Bobst LibraryAuthenticated

Download Date | 10/9/14 2:46 AM

A classroom-based assessment of ESP skills   17

them to find information about the typical documents and procedures of this type of transaction. In the second year (MII), as an interactive task with fellow students, test- takers have to hold a meeting to discuss the basic guidelines of a market research survey. Students are required to prepare ideas and arguments and take some decisions. The agreements reached in the meeting determine the basis of the survey that students have to conduct for an interdisciplinary project in which they participate in the second semester. With this activity team work is fosterd, and students see a practical application of the meeting outcome, which helps increase their motivation. Third-year students (MIII) also role-play, but in this case it is a job interview. Students are given general guidelines, and they have analysed representative sample interviews in class and studied specific vocabulary and the conventions connected with the roles of interviewer and interviewee. They have to gather information about the company, job, qualifications needed, skills required, etc. This conversation is left fairly open as regards length and information exchanged; spontaneous responses and creativity are encouraged in this way. Students are required to deliver an oral presentation each year. In MI and MII presentations are prepared by groups of four, and in MIII students work individu-ally. The product they present in the first year is a cultural tour, in the second year a special interest tour, and in the last year a new theme park. The special interest in tasking students with delivering a presentation each year is twofold. On the one hand product presentations are frequent occurences in tourism and on the other hand, apart from linguistic aspects, presentations also help to develop es-sential professional competences that will help students succeed in their career. A good command of information technology, creativity, team work, and speaking in front of an audience are some examples. The aim at the end of the three ESP courses is that students can deliver a clear, well-structured presentation, show-ing fluency and accuracy as well as communicating a professional image. The content must be supported by good group coordination with appropriate body language and visual aids.

3.2.3 Rating criteria

As the previous section shows, test tasks are basically the same for the three modules of Communication in English for Tourism. Nevertheless the assessment system is progressive in the demands it makes and in its use of particular rating criteria. Since we want to measure the students’ capacity to communicate effec-tively in professional situations, our rating criteria combine linguistic and pro-

Brought to you by | New York University Elmer Holmes Bobst LibraryAuthenticated

Download Date | 10/9/14 2:46 AM

18   María Pilar Alberola Colomar

fessional aspects. As Table 1 shows, some criteria are common to all the tests, regardless of the task to be performed, and to the three modules (I, II, III). Aspects such as fluency, accuracy, pronunciation, communication and coherence are essen-tial according to the course goals, but the appropriateness of register to function and the use of field-specific terminology are also essential when assessing stu-dents’ performance in each tourism-related context. Other rating criteria are spe-cific to the test in question. The communicative tasks performed in the additional conversation classes are subject to constraints related to time and task features. Consequently, apart from the common criteria mentioned above, only participa-tion – essential to discussion – has been added. In the tests carried out with the course teacher, as presented in Table 1, there is a wide range of specific criteria to be applied to interaction tasks and the oral presentation. In order to simplify scoring, criteria are short-listed and distributed among the three courses. Specific rating criteria vary according to the type of task assigned and the module in question. As shown in Table 1, in the interaction tasks (role play in MI and MIII, meeting in MII), flexibility to adapt when the con-versation changes direction, and interaction (including understanding) are crite-ria applied to all three modules. However, the use of professional conventions is not considered in the first-module role play, since students have not been trained in that aspect at the time of the test, but is taken into account in MII and MIII. The

Table 1: Rating criteria

Test format Criteria: Specific to test format

Criteria: Common to all the test formats

In additional speaking sessions

– Group discussion– Role play– Monologue

– Participation (M I, II, III) (M I, II, III)– Fluency – Accuracy – Pronunciation – Communication – Coherence – Appropriateness –  Field specific

terminology

In regular classes

– Role play– Meeting

– Flexibility (M I, II, III)– Interaction (M I, II, III)–  Use of professional

conventions (MII, III)– Structure (MII, III)– Arguments (MII) – Information (MIII)

– Oral presentation – Contents (M I, II, III)– Structure (MII, III)– Group coordination (M I, II)– IT resources (M I, II, III)– Body language (MIII)

Brought to you by | New York University Elmer Holmes Bobst LibraryAuthenticated

Download Date | 10/9/14 2:46 AM

A classroom-based assessment of ESP skills   19

MI interaction task is a guided dialogue, so the structure is given; taking it as a criterion would not add any relevant information. But in MII and MIII activities, where specifications are not so explicit, the creation of a logical structure is a criterion. Finally, giving relevant arguments is part of the rating criteria in the MII meeting, and information is considered only for the MIII job interview; the infor-mation requested in MI is too simple to be taken into account when rating. In the oral presentations, students directly address an audience consisting of the rest of their class mates, who play the role of tour operators and travel agents attending the presentation of a new product and react by asking questions at the end of the talk. The content of the presentation is an essential criterion introduced in the three modules. As in the oral interaction activities, having a good struc-ture is a criterion that applies only to MII and MIII presentations. In MI and MII, where the presentations are prepared and delivered in groups, another key rating criterion considered for assessment is group coordination. Other aspects that are essential to adequate communication in professional presentations, such as the use of information technology for visual support and appropriate body language, are also used as criteria, although the latter acquires particular relevance in the third year because it is a topic that is analysed in depth as part of the programme.

3.2.4 Rating scales

Although objective marking would be the ideal, it is impossible not to be influ-enced by subjective impressions and former experience – especially when we have known the students for several years. In order to maximize objectivity, we complement impression with conscious assessment using different rating scales in relation to the specific criteria detailed in the previous section. These rating scales have been created using three approaches: holistic (making a global syn-thetic judgement), analytical (looking at different aspects separately), and per-formance decision trees (sets of binary decisions). In order to create our own rating scales, we have taken as models the ones described in the CEFR, by Carroll (1980), and by Fulcher et al. (2011), among others, and we have adapted their proposals to our own purposes. We have chosen a holistic scale in the MI role play, because it is the students’ first oral test, and the test task assigned is rather short. Trying to get evidence for different items would therefore be difficult and not reliable. In the MII meet-ing and the MIII job interview, parts are longer and we have applied analytical scales. In the oral presentations we have found the use of sets of binary deci-sions more practical in order to combine verbal and non-verbal criteria. Several examples will illustrate the characteristics of the marking scales. Table 2 shows a

Brought to you by | New York University Elmer Holmes Bobst LibraryAuthenticated

Download Date | 10/9/14 2:46 AM

20   María Pilar Alberola Colomar

five-band holistic scale for the first-year (MI) role play. Each band summarizes the rating criteria listed in Table 1. Table 3 is an example of the analytic scale that is used during the meetings held in the second year (MII); it includes under

Table 2: The holistic rating scale

Bands Rating & comments

a. Very good non-native speaker. Initiates, maintains and elaborates conversation but maintains flexibility to adapt to changes in conversation. Confident use of specific vocabulary and formal register. Professional and coherent attitude.

b. Good speaker. Can develop the dialogue coherently enough. Flexibility to adapt to changes. Correct use of specific language and formal register.

c. Competent speaker. Is able to maintain the theme of dialogue, but limited use of specific vocabulary and formal register. Not too flexible, keeps stuck to memorized expressions.

d. Modest speaker. Deficiencies in mastery of language patterns. Needs to ask for repetition. Lacks flexibility and initiative. Specific vocabulary learned by heart, frequently misused.

e. Extremely limited speaker. With hesitation and misunderstandings. Unable to produce continuous and accurate discourse. Little or no use of specific vocabulary.

Table 3: The analytic rating scale

Accuracy Rating and comments– High degree of grammatical accuracy; errors are rare and difficult to

spot.– Shows a relatively high degree of grammatical control. – Some grammatical mistakes, but they do not lead to

misunderstandings.– Many grammatical mistakes, very difficult to understand the

message.

Fluency and pronunciation Rating and comments– Speech is effortless and smooth with excellent pronunciation.– Speech is occasionally hesitant. Good pronunciation.– Speech is frequently hesitant. Pronunciation mistakes.– Speech is slow and uneven except for memorized expressions. Poor

pronunciation that makes understanding difficult.

Brought to you by | New York University Elmer Holmes Bobst LibraryAuthenticated

Download Date | 10/9/14 2:46 AM

A classroom-based assessment of ESP skills   21

different headings the rating criteria established and uses a checklist format for band description. In Module III, as Table 4 shows, we have chosen for individual oral presen-tations a set of binary decisions for aspects such as visual aids, structure, infor-mation and body language. However, for criteria like pronunciation, fluency,

Flexibility Rating and comments– Always able to adapt when the conversation changes direction.– Sometimes able to adapt when the conversation changes direction.– Occasionally able to adapt when the conversation changes

direction.– Not able to adapt when the conversation changes direction.

Interaction and communication Rating and comments– Understands everything and facilitates communication. – Understands most things and gets his/her message across.– Understands basic sentences and needs repetition, sometimes

communication is difficult.– Understands too little for the simplest type of conversation.

Specific vocabulary Rating and comments– Good command of a broad choice of specific vocabulary. – Good command of a limited choice of specific vocabulary.– Some mistakes in the use of a limited choice of specific vocabulary. – No use of specific vocabulary.

Appropriateness and Professional conventions Rating and comments– Uses register adapted to the situation. Professional attitude.

Follows perfectly the professional conventions of a meeting.– Uses formal register with some mistakes. Acceptable attitude for a

professional meeting. Follows acceptably the professional conventions of a meeting.

– Problems adapting register to the situation. Not the best attitude to attend a meeting. Follows partially the professional conventions of meetings.

– Does not use the adequate register. Does not follow the professional conventions.

Coherence and arguments Rating and comments– Has prepared a wide variety of good arguments that give coherence

to his/her role.– Has prepared a limited number of arguments, still his/her role is

coherent.– Has prepared basic arguments not always coherent.– Has not prepared arguments, limited participation.

Table 3 (cont.)

Brought to you by | New York University Elmer Holmes Bobst LibraryAuthenticated

Download Date | 10/9/14 2:46 AM

22   María Pilar Alberola Colomar

accuracy, specific vocabulary and appropriateness, analytic scales were used. In Table 4, the weight of each part in the overall result can be seen. In all the marking forms, as these examples show, there is space for the tester’s comments. Some suggestions and examples used by students are noted there in order to illustrate the reasons for the score assigned. This part is the most important from the students’ perspective. Although rating scales are useful for teachers, what students really value are the examples of their speaking samples,

Table 4: A set of binary decisions

Visual aids 10%     Body language 10%    

– Attractive: Yes □ / No □– Easy to read: Yes □ / No □– Spelling mistakes: Yes □ / No □– Use of talk notes: Yes □ / No □

– Eye contact: Yes □ / No □– Gestures: Yes □ / No □

Structure: 10%     

1. Introduction: Included □ / Not included □2. Signposting: Included □ / Not included □3.  Ending: summary + question + end: Included □ / Not included □4.  Were the ideas clearly organized?: Yes □ / No □5.  Was there a good balance between the different parts of the talk?: Yes □ / No □

Information: 20%     

1. Location: Market proximity Included □ / Not included □ Access Complementary and competitive facilities2. Theme: Included □ / Not included □3.  Rides and attractions: Included □ / Not included □4. Selling points: Included □ / Not included □

Fluency and pronunciation 15%    (See scale)

Accuracy 15%    (See scale)

Specific vocabulary 10%    (See scale)

Appropriate register 10%    (See scale)

Overall score and comments:     

Brought to you by | New York University Elmer Holmes Bobst LibraryAuthenticated

Download Date | 10/9/14 2:46 AM

A classroom-based assessment of ESP skills   23

with comments, suggestions, etc. Videoing the oral tests would be more explicit, but it is time-consuming as regards practicalities and revision for feedback. Therefore we have opted to make a note of our essential comments and give feed-back immediately after the students’ performance.

4  Advantages and drawbacks of the proposed method

A major advantage of applying a multimodal communicative approach to testing is that the final score obtained by each student summarizes his/her performance in a wide variety of aspects that are essential for oral communication in profes-sional scenarios. We have evidence, for example, of how students work in groups and individually, how they perform orally in interaction tasks and in production activities, how they play their role with an audience and without one, when their performance has been previously prepared or is spontaneous, how they play their part in different settings, and finally how they deal with non-verbal issues such as IT, body language and so on. Another advantage of a multimodal assessment procedure is that students can score lower marks in activities which are more difficult for them and higher marks in those they manage better, which minimizes the consequences of the method effect and perceived difficulty of the tests. From a quantitative perspective, for the three courses considered, the marks obtained by students over the seven tests were consistent, with a maximum varia-tion of 1.5 (on a 0–10 scale) between the highest and the lowest mark obtained by each student, no matter who the teacher was. We consider a difference of 1.5 sat-isfactory given the variety of test formats and the number of activities performed. These figures confirm a high degree of reliability in our assessment procedure, achieved thanks to the usefulness of the rating scales and the coordination be-tween the two participating teachers. However, satisfactory coordination between these two teachers has not been easy – it involved multiple meetings and agree-ment on particular topics was not always easy to reach. But having two testers assess the same group of students reduced the impact of subjective impressions on the overall result. Students know that continuous assessment as a cumulative process means that less is at stake during each test they have to take, and this reduces the influ-ence of anxiety. They are aware that they have seven opportunities to contribute to their final mark. This positive aspect, however, turns into a drawback in terms of the high percentage of students who decide not to sit an exam, particularly in the additional conversation classes. This problem hardly occurs in the tests

Brought to you by | New York University Elmer Holmes Bobst LibraryAuthenticated

Download Date | 10/9/14 2:46 AM

24   María Pilar Alberola Colomar

administered by the course teacher. There are three main reasons for students not attending the additional conversation classes: first, timetabling; second, the small percentage of the overall mark accorded to each session, which leads them to assume that they can still pass even if they do not sit some of the tests; and third, the course lecturer is responsible for assessing the other skills apart from speaking, so that students consider tests administered by this teacher more important than the additional ones. Absenteeism is a difficult problem to tackle. It is not only a matter of the indi-vidual student’s decision not to attend: it also causes problems for other students because the teacher prepares the activities according to the number of people who have signed up for that particular class. Student absenteeism also affects the way the classes evolve, which has consequences for the students who do partici-pate. Marks range from 0 (the student did not sit the exam) to 10 (excellent perfor-mance), 5 being a pass. When students do not take a test, the mark obtained is 0, so when they do not attend an additional speaking session their final result does not reflect their actual speaking ability. In order to illustrate the situation we will take as an example one of the second-year students who obtained seven marks in the two additional sessions he attended. He missed three sessions, but scored 7 in the meeting and 7.5 in the oral presentation, which shows he has satisfactory lan-guage ability. However, if we apply percentages according to the test weighting, his final score is 4.3, which means that he fails the oral part of the subject. Con-sequently, 4.3 does not show the student’s speaking skills but shows an attitude to studies and a lack of responsibility. As educators, we strongly believe that if we are training students to succeed professionally, these non-linguistic issues must be included in their test results. Another important advantage of using continuous assessment is that we can monitor the development of learning. Furthermore, between one test and the next, students have an average of three weeks, which enables the teacher to give them feedback and help them remedy their weaknesses so that they can improve their performance in the following tests. Continuous assessment is especially motivating for low-level students who really appreciate feedback as a basis for improvement.

5 ConclusionsBased on the close interrelation between teaching and testing, our assessment procedure has exerted a positive washback effect on the course syllabus, with positive pedagogical implications. Qualitative and quantitative results have helped us to redesign the syllabus and class materials; test results have enabled

Brought to you by | New York University Elmer Holmes Bobst LibraryAuthenticated

Download Date | 10/9/14 2:46 AM

A classroom-based assessment of ESP skills   25

teachers to ascertain which parts of the language programme should be modified for the purpose of increasing their effectiveness; and class activities have also been redefined, emphasizing the need to work on general weaknesses. Although our assessment procedure has proved useful in these ways and we have attained the objectives we set ourselves, we have to work to overcome the drawbacks that still need to be addressed. As regards speaking skills, we have created a testing system that is internally coherent and also coherent with the course syllabus. However, further research is needed in order to integrate this system with the testing procedure applied to the other language skills and achieve a fully coher-ent approach to assessment.

ReferencesAlderson, J. Charles & Brian North (eds.).1991. Language testing in the 1990s: The

communicative legacy. London: Modern English Publications & The British Council.Bachman, Lyle F. 1990. Fundamental considerations in language testing. Oxford: Oxford

University Press.Bachman, Lyle F. 2002. Some reflections on task-based language performance assessment.

Language Testing 10(4). 453–476.Bachman, Lyle F. & Adrian S. Palmer. 1996. Language testing in practice. Oxford: Oxford

University Press.Busà, Maria Grazia. 2010. Sounding natural: Improving oral presentation skills. Language

Value 2(1). 51–67. http://www.e-revistes.uji.es/languagevalue (accessed 18 January 2013).Carroll, Brendan J. 1980. Testing communicative performance: An interim study. Oxford:

Pergamon. Carroll, Brendan J. 1991. Resistance to change. In J. Charles Alderson & Brian North (eds.),

Language testing in the 1990s: The communicative legacy, 22–27. London: Modern English Publications & The British Council.

Chou, Mu-hsuan. 2011. The influence of learner strategies on oral presentations: A comparison between group and individual performance. English for Specific Purposes. 30(4). 272–285.

Council of Europe. 2001. Common European Framework of Reference for Languages: Learning, teaching, assessment. Cambridge: Cambridge University Press.

Davies, Alan. 1988. Communicative language testing. In Arthur Hughes (ed.), Testing English for university study, 5–15. Oxford: Modern English Publications.

Davies, Alan. 2001. The logic of testing languages for specific purposes. Language Testing 18(2). 133–147.

Douglas, Dan. 2001. Language for Specific Purposes assessment criteria: Where do they come from? Language Testing 18(2). 171–185.

Fulcher, Glenn. 2003. Testing second language speaking. London: Pearson/Longman.Fulcher, Glenn, Fred Davison & Jenny Kemp. 2011. Effective rating scale development for

speaking tests: Performance decision trees. Language Testing 28(1). 5–29.Harris, Michael & Paul McCann. 1994. Assessment. Oxford: Heinemann.Heaton, John Brian. 1988. Writing English language tests, 2nd edn. London: Longman.

Brought to you by | New York University Elmer Holmes Bobst LibraryAuthenticated

Download Date | 10/9/14 2:46 AM

26   María Pilar Alberola Colomar

Heaton, John Brian. 1990. Classroom testing. London: Longman. Hughes, Arthur. 1989. Testing for language teachers. Cambridge: Cambridge University Press.Luoma, S. 2004. Assessing speaking. Cambridge: Cambridge University Press.Madsen, Harold. 1983. Techniques in testing. Hong Kong: Oxford University Press.Muñoz, Ana P. & Marta E. Álvarez. 2010. Washback of an oral assessment system in the EFL

classroom. Language Testing 27(1). 33–49.O’Sullivan, Barry. 2008. Notes on assessing speaking. http://lrc.cornell.edu/events/

past/2008–2009/papers08/osull1.pdf (accessed 22 February 2013).Paltridge, Brian. 1992. EAP placement testing: An integrated approach. English for Specific

Purposes 11(3). 243–268. Porter, Don. 1991. Affective factors in language testing. In J. Charles Alderson & Brian North

(eds.), Language testing in the 1990s: The communicative legacy, 32–40. London: Modern English Publications and The British Council.

Weir, Cyril J. 1990. Communicative language testing. Englewood Cliffs, NJ: Prentice Hall.Wiliam, D. 2011. Embeded formative assessment. Bloomington, IN: Solution Tree Press.

BionoteMaría Pilar Alberola Colomar is a full-time lecturer at Florida Universitaria (affiliated to the University of Valencia). She has been teaching English for Specific Purposes since 1993 in the degree programmes in Tourism, Business and Education. Her PhD thesis was on genre analysis, but more recently her research has focused on different aspects of second language teaching and learning: assessment, motivation, collaborative learning, learner autonomy, and the use of information technologies as pedagogical tools.

Brought to you by | New York University Elmer Holmes Bobst LibraryAuthenticated

Download Date | 10/9/14 2:46 AM

Recommended