60
Corpora in language education Corpus Linguistics Richard Xiao [email protected]

Corpora in language education

  • Upload
    garry

  • View
    49

  • Download
    0

Embed Size (px)

DESCRIPTION

Corpora in language education. Corpus Linguistics Richard Xiao [email protected]. Aims of this session. Lecture The state of the art of using corpora in language education Issues of using corpora in language teaching - PowerPoint PPT Presentation

Citation preview

Page 1: Corpora in language education

Corpora in language education

Corpus LinguisticsRichard Xiao

[email protected]

Page 2: Corpora in language education

Aims of this session• Lecture

– The state of the art of using corpora in language education– Issues of using corpora in language teaching– Case study: Using contrastive corpus linguistics to inform

SLA research

• Lab session (Home work)– Semantic prosody and DDL

Page 3: Corpora in language education

Corpus revolution• An increasing interest since the early 1990s in applying

the findings of corpus-based research to language pedagogy

10 well-received biennial international conferences Teaching and Language Corpora (TaLC, 1994-2012)

At least 30 authored or edited books, covering a wide range of issues concerning the use of corpora in language pedagogy, e.g. corpus-based language descriptions, corpus analysis in classroom, and learner corpus research

Wichmann et al (1997), Partington (1998), Bernardini (2000), Burnard and McEnery (2000), Kettemann and Marko (2002, 2006), Aston (2001), Ghadessy, Henry, and Roseberry (2001), Hunston (2002), Granger et al (2002), Connor and Upton (2002), Tan (2002), Sinclair (2003, 2004), Aston et al (2004), Mishan (2005), Nesselhauf (2005), Römer (2005), Braun, Kohn and Mukherjee (2006), Gavioli (2006), Scott and Tribble (2006), Hidalgo, Quereda and Santana (2007), O’Keeffe, McCarthy and Carter (2007), Aijmer (2009), Bennett (2010), Campoy, Gea-valor and Belles-Fortuno (2010), Cunningham (2010), Harris and Jaén (2010), Jaén,Valverde and Pérez (2010), Reppen (2010), Volodina (2010)

Page 4: Corpora in language education

Corpus revolution• Books published in China

– 杨达复 (2000), 濮建忠 (2003), 何安平 (2004a, 2004b), 华南师范大学外国语学院 (2005), 卫乃兴 , 李文中 , 濮建忠 (2005), 杨惠中 (2005), 王立非 , 梁茂成等 (2007)

Page 5: Corpora in language education

Teaching and corpora: A convergence

• Leech’s (1997) three focuses of the convergence– Indirect use of corpora in teaching (e.g. reference

publishing, materials development, language testing, and teacher training)

– Direct use of corpora in teaching (e.g. teaching about, teaching to exploit, and exploiting to teach)

– Development of teaching-oriented corpora (e.g. LSP and learner corpora)

• Corpus analysis can be illuminating ‘in virtually all branches of linguistics or language learning’ (Leech 1997: 9)

Page 6: Corpora in language education

Direct vs. indirect uses• Indirect uses

– Largely relating to what to teach

• Direct uses– Primarily concerning how to teach

• Development of teaching oriented corpora– Can relate to both

Page 7: Corpora in language education

Reference publishing• Corpus revolution in reference books (at least for English)

– Nearly unheard of for dictionaries and reference grammars published since the 1990s not to claim to be based on corpus data; even people who have never heard of a corpus are using the product of corpus research

• Corpus-based dictionaries– Learner dictionaries– Frequency dictionaries

• Corpus-based reference grammars– Longman Grammar of Spoken and Written English– Collins COBUILD series– Hunston’S Pattern Grammar

Page 8: Corpora in language education

Syllabus design and materials development • Previous research has demonstrated that the use of

grammatical structures in TEFL textbooks differs considerably from the use of these structures in native English– ‘a kind of school English which does not seem to exist

outside the foreign language classroom’ (Mindt 1996: 232)

• The order in which those items are taught in non-corpus-based syllabi ‘very often does not correspond to what one might reasonably expect from corpus data of spoken and written English’ (ibid: 245-6)

Page 9: Corpora in language education

Syllabus design and materials development • Corpora can be useful in this area - a simple yet

important role of corpora in language teaching is to provide more realistic examples of language usage reflecting the nuances and complexities of natural language

• Corpora can also provide data, especially frequency data, which may further impact on what is taught, and in what order

• Touchstone book series (McCarthy et al 2005-2006)– Based on the Cambridge International Corpus– Aiming at presenting the vocabulary, grammar, and language

functions that students encounter most often in real life

Page 10: Corpora in language education

Syllabus design and materials development • Hunston (2002: 189): ‘The experience of using corpora

should lead to rather different views of syllabus design.’• The Lexical Syllabus (Willis 1990), as implemented in the

Collins COBUILD English Course (Willis, Willis and Davids 1988-1989)– Three focuses of a lexical syllabus: ‘(a) the commonest word

forms in a language; (b) the central patterns of usage; (c) the combinations which they usually form’ (Sinclair and Renouf 1988)

– Not a syllabus for vocabulary items only, but rather covering ‘all aspects of language, differing from a conventional syllabus only in that the central concept of organization is lexis’ (Hunston 2002: 189)

Page 11: Corpora in language education

Language testing• An emerging area of language teaching which has started

to use the corpus-based approach• Alderson (1996) envisaged the following possible uses of

corpora in language testing– test construction, compilation and selection, test

presentation, response capture, test scoring, and calculation and delivery of results

– ‘The potential advantages of basing our tests on real language data, of making data-based judgments about candidates’ abilities, knowledge and performance are clear enough. A crucial question is whether the possible advantages are born out in practice’ (Alderson 1996: 258-259)

Page 12: Corpora in language education

Language testing• The concern raised in Alderson’s conclusion appears to

have been addressed satisfactorily 10 years later– Nowadays, computer-based tests are considered to be comparable to

paper-based tests (cf. Choi, Kim and Boo 2003), as exemplified by computer-based versions of TOFEL tests

• Major test service providers like UCLES have recently used corpora in testing (cf. Ball 2001; Hunston 2002: 205)– As an archive of examination scripts– To develop test materials– To optimize test procedures– To improve the quality of test marking– To validate tests– To standardize tests

Page 13: Corpora in language education

Teacher development• Corpora have been used recently in language teacher

training to enhance teachers’ language awareness and research skills – Rationale: For students to benefit from the use of corpora,

teachers must first of all be equipped with a sound knowledge of the corpus-based approach

• The integration of corpus studies in language teacher training is only a quite recent phenomenon (cf. Chambers 2007)– It may take more time, and ‘perhaps a new generation of

teachers, for corpora to find their way into the language classroom’ in secondary education (Braun 2007: 308)

Page 14: Corpora in language education

Direct uses of corpora• Leech’s (1997) three direct uses of corpora in teaching

– 1) Teaching about• Teaching corpus linguistics as an academic subject

– Part of the curricula for linguistics and language related degree programs at both postgraduate and undergraduate level

– 2) Teaching to exploit• Providing students with ‘hands-on’ know-how so that they

can exploit corpora as student-centred learning activities– 3) Exploiting to teach

• Using the corpus-based approach to teaching language and linguistics courses, which would otherwise be taught using non-corpus-based methods

• (1) and (3) are mainly associated with language / linguistics programmes

Page 15: Corpora in language education

From three P’s to three I’s• The traditional three-P approach

– Presentation – Practice – Production

• The exploratory three-I approach (cf. Carter and McCarthy 1995)– Illustration: looking at real data – Interaction: discussing and sharing opinions and

observations– Induction: making one’s own rule for a particular

feature, which ‘will be refined and honed as more and more data is encountered’ (ibid 1995: 155)

Page 16: Corpora in language education

Data-driven learning (DDL)• Direct use of corpora in pedagogy is essentially DDL • Johns (1991): ‘research is too serious to be left to the

researchers’– The language learner should be encouraged to become ‘a

research worker whose learning needs to be driven by access to linguistic data’ (Johns 1991)

• Johns (1997: 101) compares the learner to a language detective: ‘Every student a Sherlock Holmes!’

• His DDL website gives some very good examples of data-driven learning– www.lancs.ac.uk/fass/projects/corpus/Kibbitzers/Kibbitzers.ch

w

Page 17: Corpora in language education

Data-driven learning (DDL)• The DDL approach involves three stages of inductive

reasoning with corpora (Johns 1991)– Observation (of concordanced evidence)– Classification (of salient features) – Generalization (of rules)

• Roughly corresponding to Carter and McCarthy’s (1995) three I’s in the exploratory corpus-based approach, but fundamentally different from the traditional three-P approach– Three-P approach: top-down deduction– Three-I / DDL approach: bottom-up induction

Page 18: Corpora in language education

Data-driven learning (DDL)• Can be either teacher-directed or learner-led (i.e.

‘discovery learning’) to suit the needs of learners at different levels, but basically learner-centred• Leech (1997: 10): The autonomous learning process ‘gives the student

the realistic expectation of breaking new ground as a “researcher”, doing something which is a unique and individual contribution’

• This is true of advanced learners only!

• The key to successful data-driven learning is the appropriate level of pedagogical mediation depending on the learners’ age, experience, and proficiency level, etc

o ‘A corpus is not a simple object, and it is just as easy to derive nonsensical conclusions from the evidence as insightful ones’ (Sinclair 2004: 2)

Page 19: Corpora in language education

Direct uses: Current situation• So far confined largely to learning at more advanced

levels, especially in tertiary education• Almost absent in general ELT classroom, e.g. secondary

education (and in the teaching of other foreign languages at all levels)– Learners’ age, level and experience– Time constraints and curricular requirements– Knowledge and skills required of teachers for corpus analysis

and pedagogical mediation– Access to appropriate resources such as corpora and tools– …or indeed probably a combination of all of these factors

Page 20: Corpora in language education

LSP corpora vs. professional communication

• Third focus of convergence: Development of teaching-oriented corpora: LSP, parallel, and learner corpora

• Teaching of language for specific purposes and professional communication can benefit greatly from domain- or genre-specific specialized corpora both directly and indirectly, e.g.– Coxhead’s (2000) Academic Word List (AWL)– Paul Nation’s Range and GSL/AWL

• http://www.victoria.ac.nz/lals/about/staff/paul-nation

– Biber’s (2006) comprehensive analysis of university language based on the TOEFL 2000 Spoken and Written Academic Language Corpus

– McCarthy and Handford’s (2004) exploration of pedagogical implications regarding spoken business English on the basis of the Cambridge and Nottingham Spoken Business English Corpus (CANBEC)

Page 21: Corpora in language education

LSP corpora vs. professional communication

• Specialized corpora in translation teaching– ‘Large corpora concordancing’ (LCC) can help students

to develop ‘awareness’, ‘reflectiveness’ and ‘resourcefulness’, the skills that distinguish a translator from those unskilled amateurs (Bernardini 1997)

– Corpora help trainee translators become aware of general patterns and preferred ways of expressing things in the target language, get better comprehension of source language texts and improve production skills (Zanettin 1998)

– Comparable and parallel corpora in translation studies

Page 22: Corpora in language education

Parallel concordancing• Parallel corpora and parallel concordancing

are particularly useful in translation teaching• They can also aid the so-called ‘reciprocal

learning’ (Johns 1997) – i.e. two language learners with different L1

backgrounds are paired to help each other learn their language

Page 23: Corpora in language education

Learner corpora• Welcomed as one of the most exciting recent

developments in corpus-based language studies• For indirect use, they have been explored to inform

curriculum design, materials development and teaching methodology (cf. Keck 2004)

• For direct use, they provide a bottom-up approach to language teaching and learning - as opposed to the top-down approach with native corpora of the target language (Osborne 2002)

Page 24: Corpora in language education

Learner corpora• Can also provide indirect, observable, and empirical

evidence for the invisible mental process of language acquisition and serve as a test bed for hypotheses generated using the psycholinguistic approach in SLA research

• Provide an empirical basis enabling the findings previously made on the basis of limited data of a handful of informants to be generalized

• Have widened the scope of SLA research so that interlanguage research nowadays treats learner performance data in its own right rather than as decontextualised errors in traditional error analysis (cf. Granger 1998: 6)

Page 25: Corpora in language education

Ongoing debate: Frequency & authenticity

• Often considered as two of the most important advantages of using corpora

• Also the targets of criticism from language pedagogy researchers– Corpus data impoverishes language learning by giving

undue prominence to what is simply frequent at the expense of rarer but more effective or salient expressions (Cook 1998)

– Corpus data is authentic only in a very limited sense in that it is de-contextualized – genuine but not authentic (Widdowson 1990, 2000, 2003)

– …flawed arguments

Page 26: Corpora in language education

Frequency• ‘Using corpus data not only increases the chances of

learners being confronted with relatively infrequent instances of language use, but also of their being able to see in what way such uses are atypical, in what contexts they do appear, and how they fit in with the pattern of more prototypical uses’ (Osborne 2001: 486)

• ‘Frequency ranking will be a parameter for sequencing and grading learning materials’ because ‘frequency is a measure of probability of usefulness’ and ‘high-frequency words constitute a core vocabulary that is useful above the incidental choice of text of one teacher or textbook author’ (Goethals 2003: 424)

Page 27: Corpora in language education

Frequency• Do you agree?

– ‘What is frequent in language will be picked up by learners automatically, precisely because it is frequent, and therefore does not have to be consciously learned’ (Kaltenböck and Mehlmauer-Larcher 2005: 78)

• This is not true, however – cross-linguistic difference– Determiners such as a and the are certainly very

frequent in English, yet they are difficult for Chinese learners of English because their mother tongue does not have such grammatical morphemes and does not maintain a count-mass noun distinction

Page 28: Corpora in language education

Frequency• Frequency ‘should be only one of the criteria used to

influence instruction’; ‘the facts about language and language use which emerge from corpus analyses should never be allowed to become a burden for pedagogy’ (Kennedy 1998: 290)– overall teaching objectives– learners’ concrete situations– cognitive salience– learnability– generative value– teachers’ intuitions

Page 29: Corpora in language education

Frequency• It would be inappropriate for language teachers, syllabus

designers, and materials writers to ignore ‘compelling frequency evidence already available’ (Leech 1997: 16)– ‘Whatever the imperfections of the simple equation “most

frequent” = “most important to learn”, it is difficult to deny that frequency information becoming available from corpora has an important empirical input to language learning materials.’

– Lech, G. (2011) ‘Why frequency can no longer bw ignored in ELT’. 外语教学与研究 2011(1).

• Frequency can at least help syllabus designers, materials writers and teachers alike to make better-informed and more carefully motivated decisions (cf. Gavioli and Aston 2001: 239)

Page 30: Corpora in language education

Authenticity• Corpus data are authentic by definition• Widdowson (1990, 2000) questions the use of

authentic texts in language teaching – Authenticity of language in the classroom is ‘an

illusion’ (1990: 44) because even though corpus data may be authentic in one sense, its authenticity of purpose is destroyed by its use with an unintended audience of language learners

Page 31: Corpora in language education

Authenticity• Widdowson’s (2003) distinction between

genuineness (features of text as a product) vs. authenticity (features of discourse as a process)– Corpora are genuine in that they comprise attested

language use, but they are not authentic for language teaching because their contexts (as opposed to co-texts) have been deprived

– Implication?• Only language produced for imaginary situations in the

classroom is ‘authentic’

Page 32: Corpora in language education

Authenticity• Product (text) vs. process (discourse)

– Interesting but not always useful– Using product as evidence for process may not be less

reliable; sometimes this is the only practical way of finding about process (cf. Stubbs 2001)

• Stubbs (2001) draws a parallel between corpora in corpus linguistics and rocks in geology– ‘both assume a relation between process and

product. By and large, the processes are invisible, and must be inferred from the products.’

Page 33: Corpora in language education

Authenticity• Like geologists who study rocks (products)

because they are interested in geological processes (e.g. earthquakes, volcanoes) to which they do not have direct access, SLA researchers can analyze learner performance data (products) to infer the inaccessible mental process of SLA

Page 34: Corpora in language education

Authenticity• If we do follow Widdowson’s distinction…

– Genuine: attested– Authentic: occurring in real communicative

context

• …are the imaginary situations conjured up for classroom teaching authentic?– Do they occur in real communicative context?– When students are learning and practising a

shopping ‘discourse’, are they actually doing shopping?

Page 35: Corpora in language education

Authenticity• Furthermore, invented examples often do not reflect

nuances and complexities of real usage (Fox 1987)– Students who have been taught ‘school English’

cannot readily cope with English used by native speakers in real life (Mindt 1996: 232)

• ‘The preference for “authentic” texts requires both learners and teachers to cope with language which the textbooks do not predict’ (Wichmann 1997: xvi)– Corpora are useful for this purpose

Page 36: Corpora in language education

Corpus-based pedagogy: Today

• Currently, corpora appear to have played a more important role in helping to decide what to teach (i.e. indirect uses) than how to teach (i.e. direct uses)– Indirect uses of corpora seem to be well

established– Direct uses of corpora in teaching are largely

confined to tertiary education and are nearly absent in general language classroom

Page 37: Corpora in language education

From today to tomorrow• If corpora are to be further popularised to

more general language teaching context, there are two priorities in near future– Corpus linguists must create and facilitate access to

corpora that are pedagogically motivated, in both design and content, to meet pedagogical needs and curricular requirements so that corpus-based learning activities become an integral part, rather than an additional option, of the overall language curriculum

– Language teachers should be provided, through pre-service training or continued professional development, with the required knowledge and skills for corpus analysis and pedagogical mediation of corpus-based learning activities

Page 38: Corpora in language education

Corpus-based pedagogy: Tomorrow

• If these two tasks are accomplished, it is my view that corpora will not only ‘revolutionize the teaching of grammar’ in the 21st century as Conrad (2000: 549) has predicted, they will also fundamentally change, with the aid of a new generation of teachers, the ways we approach language teaching, including both what is taught and how it is taught

Page 39: Corpora in language education

Using CCL to inform SLA

• Introducing Contrastive Corpus Linguistics (CCL)

• Presenting a brief summary of the relevant findings in a corpus-based contrastive study of passives in English and Chinese (Xiao, McEnery and Qian 2006)

• Exploring passives in the Chinese learner English Corpus (CLEC) in comparison with a comparable native English corpus

Page 40: Corpora in language education

Contrastive corpus linguistics• Contrastive analysis (CA)

– Recognised as an important part of foreign language teaching methodology following WWII

– Dominant throughout the 1960s– But soon lost ground to more learner-oriented approaches

such as error analysis, performance analysis and interlanguage analysis

– Revived in the 1990s• …largely thanks to the advances of the corpus methodology,

which is inherently comparative in nature (Salki 2002, Xiao 2011)

• Contrastive Corpus Linguistics brings together the strengths of contrastive analysis and corpus analysis

Page 41: Corpora in language education

Contrastive corpus linguistics• Parallel vs. comparable corpora

– Parallel corpus: source texts plus translations– Comparable corpus: different native languages

sampled with comparable sampling criteria and similar balance

• Can parallel corpora be used in contrastive studies?– ‘translation equivalence is the best available basis of

comparison’ (James 1980: 178)– ‘studies based on real translations are the only sound

method for contrastive analysis’ (Santos 1996: i)

Page 42: Corpora in language education

Contrastive corpus linguistics• Translated language is merely an unrepresentative

special variant of the target native language which is perceptibly influenced by the source language...unreliable for contrastive analysis if relied upon alone – Baker 1993; Gellerstam 1996; Teubert 1996; Laviosa 1997;

McEnery and Wilson 2001; McEnery and Xiao 2002; McEnery and Xiao 2007; Xiao and Yue 2009, Xiao 2010, 2011, 2012

• In contrast, comparable corpora are well suited for contrastive study as they are unaffected by translationese

Page 43: Corpora in language education

Contrastive corpus linguistics

Page 44: Corpora in language education

Comparable corpora in this study• Two English corpora

– Freiburg-LOB (FLOB)– BNCdemo (4 M words of conversations)

• Two Chinese corpora– Lancaster Corpus of Mandarin Chinese (LCMC)– LDC CallHome Mandarin Transcripts: 300K words

• English and Chinese data are comparable in compositions and sampling periods– Providing a reliable basis for the cross-linguistic contrast

of passives in the two languages

Page 45: Corpora in language education

English vs. Chinese passives (1)

• Ten times as frequent in English as in Chinese– Dynamicity– Pragmatic meaning– Different habitual

tendency– Unmarked notional

passives• Chinese learners of

English are very likely to underuse passives in their interlanguage

0

200

400

600

800

1000

1200

English Chinese

Page 46: Corpora in language education

English vs. Chinese passives (2)• Passive formation

– English passives• Auxiliary be/get followed by a past participial verb

– Chinese passives• Passivised verbs do not inflect morphologically• Also the notion of auxiliary verbs is less salient in Chinese• Syntactic passives (e.g. 被 , 叫 , 让 )• Lexical passives (e.g. 挨 , 受(到) , 遭(到) )• Unmarked notional passive and topic sentences (topic + comment)• Special structures (e.g. disposal 把 and predicative 是…的 )

• Choice of correct auxiliaries and proper inflectional forms of passivised verbs can constitute a difficult area for Chinese learners to acquire English passives

Page 47: Corpora in language education

English vs. Chinese passives (3)• Long vs. short passives• Short passives are predominant in English (over 90% in

speech and writing)– Often used as a strategy that allows one to avoid mentioning the

agent when it cannot or must not be mentioned• 3 out of 5 syntactic passive markers in Chinese ( 为…所 , 叫 ,

让 ) only occur in long passives• For 被 and 给 passives, proportions of short forms (60.7%

and 57.5% respectively) are significantly lower than in English– The agent must normally be spelt out at early stages of Chinese,

though the constraints have become more relaxed• Chinese learners of English are expected to overuse long

passives and underuse short passives

Page 48: Corpora in language education

English vs. Chinese passives (4)• Chinese passives are more

frequently used with an inflictive meaning– Chinese passives were used at

early stages primarily for unpleasant or undesirable events (bei, “suffer”)

• Marking negative pragmatic meanings is not a basic feature of the English passive norm (be passives)– Get-passives sometimes

(37.7% of the time) refer to undesirable events

• Chinese learners are more likely to use English passives for undesirable situations

15.0%

51.5%

80.3%

37.8%

4.7% 10.7%

0%

20%

40%

60%

80%

100%

English be passives Chinese bei passives

Language

Perc

ent

Negative Neutral Positive

Page 49: Corpora in language education

Interlanguage of Chinese learners• CLEC (learn data): the Chinese Learner English Corpus

– One million words– Essays– Five proficiency levels (high school students and university

students)– Fully annotated with learner errors using a tagset of 61

error types clustered in 11 categories• LOCNESS (control data): the Louvain Corpus of Native

English Essays– ca. 300,000 words– Essays– British A-Level children and British and American

university students• Roughly comparable in terms of task type, learner age

and sampling period

Page 50: Corpora in language education

Underuse of passives

Corpus Words Passives Frequency per 100K

words

LL score

p value

CLEC 1,070,602 9,711 907

LL=1235.6

1.d.f.

p<0.001LOCNESS 324,304 5,465 1,685

Page 51: Corpora in language education

Long vs. short passives

• As can be expected from the contrastive analysis, in comparison with native English writing, long passives are more frequent in Chinese learner English – Long passives in CLEC

• 9.14%: 888 out of 9,711

– Long passives in LOCNESS• 8.44%: 461 out of 5,465

• ...the difference is marginal and not statistically significant– LL=2.184, 1 d.f., p=0.139

Page 52: Corpora in language education

Pragmatic meanings• Passives are more

frequently negative in Chinese learner English– CLEC

• Negative: 25.7%• Positive: 5.9%• Neutral: 68.4%

– LOCNESS• Negative: 16.8%• Positive: 4.4%• Neutral: 78.8%

– LL=7.4, 2 d.f., p=0.025• Consistent with earlier

finding (50.5% vs. 15%)

25.7% 16.8%

68.4% 78.8%

5.9% 4.4%

0%

20%

40%

60%

80%

100%

CLEC LOCNESS

Corpus

Per

cent

Positive

Neutral

Negative

Page 53: Corpora in language education

Passive errors vs. learner levels

0

50

100

150

200

250

ST2 ST3 ST4 ST5 ST6

Fre

qu

ency

per

200

,000

wo

rds

Learner level

Aux. errors

Misformation

Misuse

Underuse

All error types

Page 54: Corpora in language education

Error types vs. learner levels• Error types are associated with learner levels when the dataset is

taken as a whole– LL=51.774, 12 d.f., p<0.001

• But similar learner groups also show similar error types– ST2 >> ST3: statistically significant (LL=27.303, 3 d.f., p<0.001)– ST3 >> ST4: not significant (LL=6.955, 3 d.f., p=0.073)– ST4 >> ST5: statistically significant (LL=18.563, 3 d.f., p<0.001)– ST5 >> ST6: not significant (LL=6.987, 3 d.f., p=0.072)

ST2 ST3/ST4 ST5/ST6 (High (Junior/Senior (Junior/Senior

school non-English English majorstudents) major students) students)

Page 55: Corpora in language education

Underuse errors• Likely to be a result of L1 transfer, as can be predicted from

results of cross-linguistic contrast and confirmed by the learner-native corpus comparison

• Typically occur with verbs whose Chinese equivalents are not normally used in passives, e.g.– A birthday party will hold in Lily’s house. (ST2)– The woman in white called Anne Catherick. (ST5)

• Also occur under the influence of the Chinese topic sentence– The supper had done. (ST2) 晚饭 <*bei> 做好 了 supper <*PASS> cook-ready ASP topic comment

Page 56: Corpora in language education

Misuse errors• 1) Intransitive verbs used in passives, e.g.

– A very unhappy thing was happened in this week. (ST2)– I was graduated from Zhongshan University (ST5)

• 2) Misuse of ergative verbs, e.g.– …the secince <sic science> is developed quickly (ST4)

• 3) Training transfer (overdone passive training in classroom instructions), e.g.– …many machine <sic machines> and appliance <sic appliances> are

used electricity as power (ST5)– Because they have been mastered everything of this job… (ST4)

Page 57: Corpora in language education

Misformation errors

• Possibly a result of L1 interference• Related to morphological inflections

– Passivised verbs do not inflect in Chinese

• Chinese learners tend to use uninflected verbs or misspelt past participles in passives, e.g.– His relatives can not stop him, because his choice is

protect by the laws. (ST6)– Since the People’s Republic of china <sic China> was

found on October 1, 1949… (ST2)

Page 58: Corpora in language education

Auxiliary errors

• Related to omission and misuse of auxiliaries• A result of L1 interference

– Auxiliaries are not a salient linguistic feature in Chinese • Chinese is not a morphologically inflectional language

• Chinese learners tend to omit or misuse auxiliaries in passives, e.g.– In China, since the new China <sic was> established,

people’s life has goten <sic gotten> better and better. (ST3)

– I am not a smoker, but why do <sic are> we forced to be a second-hand smoker? (ST5)

Page 59: Corpora in language education

Case study summary• The learner’s performance in interlanguage can be

predicted, diagnosed, and accounted for from the perspective of Contrastive Corpus Linguistics

• The integrated approach that combines contrastive analysis (CA) and contrastive interlanguage analysis (CIA) is an indispensable tool in SLA research– Granger (1998: 14): ‘if we want to be able to make firm

pronouncements about transfer-related phenomena, it is essential to combine CA and CIA approaches.’

• 语料库与语言教育 . 中国外语教育 2008(5)• 语料库在语言教学中的运用 . 浙江大学学报 ( 人文社科

版 ) 2010(6)

Page 60: Corpora in language education

Lab: semantic prosody and DDL• Sentence (a) was produced by a Chinese-speaking postgraduate

of tourism, which Tim Johns suggested revising as (b). Why? Can you provide evidence from available corpora to support your answer and revise (c-e) from CLEC?– (a) Although economic improvement may be caused by tourism, the

investment and operational costs of tourism must also be considered.– (b) Although tourism may lead to economic improvement, the investment

and operational costs of tourism must also be considered.– (c) The city caused him great interest, caused all citizens to grasp time and

chances, to work for a better life.– (d) <...> there are a lot of advantages are caused by them.– (e) During the past fifty years, the political, economic, and social changes in

China have caused dramatic changes in people’s lives.

• BNCweb (collocation) or FLOB