24
Tracing idiomaticity in learner language: the case of BE Przemysław Kaszubski School of English Adam Mickiewicz University Poznań, Poland CL2001 Lancaster University 30 March - 02 April 2001

Tracing idiomaticity in learner language: the case of BE Przemysław Kaszubski School of English Adam Mickiewicz University Poznań, Poland CL2001 Lancaster

Embed Size (px)

Citation preview

Page 1: Tracing idiomaticity in learner language: the case of BE Przemysław Kaszubski School of English Adam Mickiewicz University Poznań, Poland CL2001 Lancaster

Tracing idiomaticity in learner language:

the case of BE

Przemysław Kaszubski

School of English

Adam Mickiewicz University

Poznań, Poland

CL2001 Lancaster University30 March - 02 April 2001

Page 2: Tracing idiomaticity in learner language: the case of BE Przemysław Kaszubski School of English Adam Mickiewicz University Poznań, Poland CL2001 Lancaster

CL2001, Lancaster University

Premises (1)

EFL learners’ overuse of high-frequency words: what does it mean? Intensive collocability of core lexical items Multi-word extensions (compounds, coinages, idioms,

expressions, phrasals)

Confrontation Available corpus-driven extraction methods

vs. pedagogical usefulness: L1-perspective (the role of

transfer)

Page 3: Tracing idiomaticity in learner language: the case of BE Przemysław Kaszubski School of English Adam Mickiewicz University Poznań, Poland CL2001 Lancaster

CL2001, Lancaster University

Premises (2)

Methodological assumptions multi-corpus scheme with Polish advanced EFL

learner data as hub data variables: a) genre / text-type; b) L1; c) proficiency

level d) age / maturity level Lemma-based approach (as opposed to wordform- or

family-oriented approaches)

Lexical BE: non-idiomatic or ignored because troublesome?

Page 4: Tracing idiomaticity in learner language: the case of BE Przemysław Kaszubski School of English Adam Mickiewicz University Poznań, Poland CL2001 Lancaster

CL2001, Lancaster University

The hypotheses

negative correlation between proficiency level and frequencies of non-idiomatic uses

positive correlation between proficiency level and frequencies of idiomatic BE except EFL learners’ ‘favourite expressions’

traceability of (at least) some ‘favourite expressions’ to L1

Page 5: Tracing idiomaticity in learner language: the case of BE Przemysław Kaszubski School of English Adam Mickiewicz University Poznań, Poland CL2001 Lancaster

CL2001, Lancaster University

The challenge of idiomatic BE (1):extraction of ‘verbal’ BE

lexical (semantic BE) = readily translatable lexically: existential BE; copular BE main-verb function non-finite forms: infinitives and participles (the latter if

not adjectival) non-finite forms: non-count gerunds (NOT ‘a being’)

passive auxiliary: central passives vs semi- and pseudo-passives (cf. Quirk et al. 1985: 167-171)

Page 6: Tracing idiomaticity in learner language: the case of BE Przemysław Kaszubski School of English Adam Mickiewicz University Poznań, Poland CL2001 Lancaster

CL2001, Lancaster University

The challenge of idiomatic BE (2):semi-lexical MWUs

modal idiom ‘BE to <do sth>’semi-auxiliaries (BE=linking BE)

BE able to <do sth>; BE about to <do sth>; BE apt to <do sth>; BE bound to <do sth>

‘Polish-style’ semi-auxiliary: BE allowed to <do sth>

Page 7: Tracing idiomaticity in learner language: the case of BE Przemysław Kaszubski School of English Adam Mickiewicz University Poznań, Poland CL2001 Lancaster

CL2001, Lancaster University

The ‘extended’ tripartite idiomaticity model: the criteria

lexical fixednesssyntactic fixedness and / or anomalysemantic opacity lexicalisation / institutionalisation / specialisation

/ conventionality = frequency + distribution implementation of fourth criterion via external

sources BBI2 & LDOCE3

Page 8: Tracing idiomaticity in learner language: the case of BE Przemysław Kaszubski School of English Adam Mickiewicz University Poznań, Poland CL2001 Lancaster

CL2001, Lancaster University

The ‘extended’ tripartite idiomaticity model: the levels

frozen expressions restricted uses

restricted collocations discourse formulae

free combinations

Page 9: Tracing idiomaticity in learner language: the case of BE Przemysław Kaszubski School of English Adam Mickiewicz University Poznań, Poland CL2001 Lancaster

CL2001, Lancaster University

The challenge of idiomatic BE (3):implementation of the frozen level

frozen uses: BE frozen lexically and formally in a particular wordform ‘that is (to say)’; ‘to be sure’ (= certainly); ‘for the time

being’ (= currently) phrasal idioms: ‘to have been around’

Page 10: Tracing idiomaticity in learner language: the case of BE Przemysław Kaszubski School of English Adam Mickiewicz University Poznań, Poland CL2001 Lancaster

CL2001, Lancaster University

Lexical BE: restricted uses (1)

phrasal-prepositional uses of BE (e.g. ‘BE into <sth>’, ‘BE on’)

super-pattern BE + idiom:A survey of the complementation patterns of lexical BE (based on the evidence of Quirk et al. 1985) has shown that the verb tends to be followed by complements that:

a) either constitute idiomatic phrases

b) or restrict BE’s realm of reference by influencing its subject collocates

c) or else form simple, ad-hoc, fully compositional phrases (BE <noun>; BE <adj>; BE <adjunct>)

Page 11: Tracing idiomaticity in learner language: the case of BE Przemysław Kaszubski School of English Adam Mickiewicz University Poznań, Poland CL2001 Lancaster

CL2001, Lancaster University

Lexical BE: restricted uses (2): BE + idiom (1): prototypical types

a) BE <adj>:BE + adjectival idioms or collocations - predicatively unified

and often substitutable by a single verb (‘BE conditional upon <sth> - cf. depend on <sth>; BE alive - cf. live)

predicative pseudo-passives and semi-passives (‘BE composed of <sth>’, ‘BE connected with <sth>’, ‘BE situated <somewhere>’)

BE + adjectival / participial predicate + to-clause (‘BE liable to <do sth>’, ‘BE reluctant to <do sth>’)

b) BE <noun>:BE + nominal idiom (‘BE a bitter pill for <sb> (to swallow)’; ‘it

BE high time’)

Page 12: Tracing idiomaticity in learner language: the case of BE Przemysław Kaszubski School of English Adam Mickiewicz University Poznań, Poland CL2001 Lancaster

CL2001, Lancaster University

Lexical BE: restricted uses (3): BE + idiom (2): non-prototypical

c) BE <adjunct>: means adjuncts: conventionalised / lexically fixed (‘Transport

is by ferry’) or replacing a longer predicate or a central passive (‘such contracts are with people who...’ = ‘are signed with’)

stimulus adjuncts: rare & restricted by BE’s subject (‘his main interest was in sport’)

agent adjuncts: usually restricted to authorship (‘The book is by an unknown writer’)

measure adjuncts: non-prototypical though probably salient (‘The jacket was 10 pounds’ - cf. ‘The prize was 10 pounds’)

Page 13: Tracing idiomaticity in learner language: the case of BE Przemysław Kaszubski School of English Adam Mickiewicz University Poznań, Poland CL2001 Lancaster

CL2001, Lancaster University

BE: restricted uses (4): discourse-conditioned phrases

Pattern/Subtype Example/Sub-pattern

conventional discourseformulas and linking phrases

‘that/this BE why/ the reason why...’ etc. (oftensentence initially)‘there is every/no reason (for <sb>) to <do sth>’

‘<sth> BE that...*’ ‘the idea/problem/thing is that...’

‘<sth> BE to <do sth>**’ ‘his purpose/task/approach is to <do sth>’

idiomatic referential uses ‘BE so/otherwise’‘<sb/sth> BE one/those that/who ...’

BE + clause: otherformulae***

‘<sth: the question etc.> BE whether ...’‘<sth> BE how <sth> <happened>’‘it/this BE because ...’

Other formulae ‘<sth> BE for <sb> to <do sth>’‘<sth> BE as follows/the following’ etc.

Page 14: Tracing idiomaticity in learner language: the case of BE Przemysław Kaszubski School of English Adam Mickiewicz University Poznań, Poland CL2001 Lancaster

CL2001, Lancaster University

Lexical BE: free combinations

non-idiomatic complementation of the prototypical types (‘the young are reckless’; ‘he was a man in his late forties’)

non-idiomatic cases of obligatory but fully semantically compositional adverbial complementation:BE <adjunct: time / space / metaphorical space> (‘It was 10

years ago’; ‘Pure fire (= the stars) are in the heavens’.)BE <adjunct: purpose / accompaniment / measure etc.> (BE with

<sb>, BE for <sth>, BE about <sth>)

‘there BE’ and BE after anticipatory ‘it’: unless lexicalised or specialised, as in ‘it BE high time’, ‘there BE every reason that’ etc.

Page 15: Tracing idiomaticity in learner language: the case of BE Przemysław Kaszubski School of English Adam Mickiewicz University Poznań, Poland CL2001 Lancaster

CL2001, Lancaster University

Lexical BE: Other cases

in cleft and pseudo-cleft sentences: (‘It is marriage that constitutes the basic part of every nation’; ‘All his people ask for is no more war’)

subject-to-subject raising after copulas (SEEM (to be), TURN OUT <to be>) or when complementing mental verbs (BE found/thought etc. (to be))

Page 16: Tracing idiomaticity in learner language: the case of BE Przemysław Kaszubski School of English Adam Mickiewicz University Poznań, Poland CL2001 Lancaster

CL2001, Lancaster University

Idiomatic BE: automatic extraction? (1)

Problem 1: collocation vs co-occurrence word clusters

Many genuine collocations and MWUs are not contiguous (Kennedy 1998: 114) and may spill outside the typical 4:4 window

co-occurrence statistics (WordSmith; TACT, CUE)MI - identifies ‘idiosyncratic collocations’ (Oakes 1998; 90) &

fails to associate many lemmasz-score & t-score - better suited to frequent words but also

mutual and leaving much ‘noise’

stop-listing not quite possible

Page 17: Tracing idiomaticity in learner language: the case of BE Przemysław Kaszubski School of English Adam Mickiewicz University Poznań, Poland CL2001 Lancaster

CL2001, Lancaster University

Idiomatic BE: automatic extraction? (2)

Problem 2: part-of-speech tagging the passive bottleneck- the need for sampling

Problem 3: semantic disambiguation and associations sometimes only grouping data uncovers a meaningful

type of association (Stubbs 1998:4)

Problem 4: learner data

Page 18: Tracing idiomaticity in learner language: the case of BE Przemysław Kaszubski School of English Adam Mickiewicz University Poznań, Poland CL2001 Lancaster

CL2001, Lancaster University

ENGLISH CORPORA

non-native English native English

‘apprentice’ corpora ‘expert’ corpora

1. Intermediate 2. Upper-intermediate

3. Advanced 4. College 5. Professional

Polishintermediate EFL

Spanish(upper-)

intermediate EFL

Belgian-French

advancedEFL

Polishadvanced

EFL

British and Americancollege learner

English

Britishacademic

writing

British andAmerican quality

press

PLLC SPAN FREN IFA-P(ICLE) LOCN(ARG) MCONC LOB&BROWN

92,712tokens

94,965tokens

101,442tokens

107,990tokens

106,255tokens

97,914tokens

94,421tokens

POLISH CORPORA

POL-STUD ‘apprentice’corpus

4. College level Polish college compositions 103,382tokens

POL-EXP ‘expert’ corpus 5. Professional level Polish academic papers + quality-press articles

101,348tokens

The corpus base: full specification

Page 19: Tracing idiomaticity in learner language: the case of BE Przemysław Kaszubski School of English Adam Mickiewicz University Poznań, Poland CL2001 Lancaster

CL2001, Lancaster University

Do Polish EFL writers overuse BE? (1)

Non-lexical BE: underuse: central passives (especially at lower proficiency

levels) overuse: ‘BE going to <do sth>’ (diminishing with proficiency) overuse: ‘BE able to <do sth>’ (especially advanced-level

learners)

Lexical BE: frozen BE: scarce lower-proficiency: fewer collocational idioms & many

more free combinations

Page 20: Tracing idiomaticity in learner language: the case of BE Przemysław Kaszubski School of English Adam Mickiewicz University Poznań, Poland CL2001 Lancaster

CL2001, Lancaster University

5.Professional

4.College

3.Advanced

2.Upp-Int

1.Interm95% confidence

intervals LOB&BR

MCONC LOCN IFA-PICLE

FREN SPAN PLLC

Estimated standardised frequency per 100,000 wordsFrozen uses >5

<76>24

<127>2

<75>27

<138>10<98

>25<134

>6<93

Restricted: BE +idiom

>314<525

>282<504

>325<552

>324<568

>299<529

>228<443

>188<391

Restricted:formulae

>213<397

>290<514

>138<306

>280<512

>317<551

>173<367

>191<396

Cleft sentences >47<159

>49<173

>54<177

>82<234

>36<152

>8<96

>0<75

Freecombinations

>1,717<1,990

>1,778<2,086

>2,005<2,290

>2,346<2,686

>2,209<2,528

>2,574<2,870

>3,317<3,611

Total: >2,552<2,892

>2,713<3,115

>2,775<3,148

>3,406<3,793

>3,176<3,553

>3,260<3,657

>3,968<4,300

Estimated % of lexical BE in a corpusFrozen uses >0.2%

<2.8%>0.8%<4.4%

>0.1%<2.5%

>0.8%<3.8%

>0.3%<2.9%

>0.7%<3.9%

>0. 1%<2.3%

Restricted: BE +idiom

>11.5%<19.3%

>9.7%<17.3%

>11.0%<18.6%

>9.0%<15.8%

>8.9%<15.7%

>6.6%<12.8%

>4.5%<9.5%

Restricted:formulae

>7.8%<14.6%

>10.0%<17.6%

>4.7%<10.3%

>7.8%<14.2%

>9.4%<16.4%

>5.0%<10.6%

>4.9%<9.6%

Cleft sentences >1.7%<5.9%

>1.7%<5.9%

>1.8%<6.0%

>2.3%<6.5%

>1.1%<4.5%

>0.2%<2.8%

>0.0%<1.8%

Freecombinations

>63.1%<73.1%

>61.0%<71.6%

>67.7%<77.3%

>65.2<74.6

>65.7%<75.1%

>74.4%<83.0%

>80.2%<87.4%

Total 100% 100% 100% 100% 100% 100% 100%

Lexical BE: summary of 500-line concordance samples

Page 21: Tracing idiomaticity in learner language: the case of BE Przemysław Kaszubski School of English Adam Mickiewicz University Poznań, Poland CL2001 Lancaster

CL2001, Lancaster University

Do Polish EFL writers overuse BE? (2): some specific findings

frozen BE: overuse: ‘what is more’ (cf. Polish ‘co więcej’)

Restricted collocations (BE + idiom): intermediate overuse: ‘BE full of <sth>’ (cf. Polish

‘być pełnym czegoś’) overuse: ‘BE connected / associated {etc.} with <sth>’

(cf. Polish ‘być związanym z czymś’) underuse: ‘BE concerned with <sth>’ as opposed to

the overused formula ‘as far as <sth> BE concerned’ (cf. Polish ‘jeśli chodzi o ...’)

Page 22: Tracing idiomaticity in learner language: the case of BE Przemysław Kaszubski School of English Adam Mickiewicz University Poznań, Poland CL2001 Lancaster

CL2001, Lancaster University

Do Polish EFL writers overuse BE? (3): some specific findings

Restricted level: discourse formulae: heavy overuse: ‘that/this BE why ...’ (also ‘that’s why’) (cf.

Polish ‘Dlatego (właśnie)’) (sentence initial) overuse: ‘what is (more) <adj: important

etc.> (cf. Polish ‘co ważne’)

OVERALL IMPRESSION: Many more phrases are overused than underused Overused expressions are either likely underpinned by

equivalent or associated L1-based options, or by the (spoken) familiarity of a phrase (‘BE able to’; ‘BE going to’), or both.

Page 23: Tracing idiomaticity in learner language: the case of BE Przemysław Kaszubski School of English Adam Mickiewicz University Poznań, Poland CL2001 Lancaster

CL2001, Lancaster University

Conclusions

EFL learners do apply BE more frequently, but not only lexical uses and free combinations add to the impression

Sadly, analyses of the like kind are hardly possible automatically or even semi-automatically, but they may serve as benchmarks for developing tools that will more successfully tackle BE in larger corpora

Such quantitative and contrastive accounts of EFL learner language are needed, especially at higher proficiency levels, despite caveats about idealised native corpus ‘norms’ (cf. Leech 1998)

Page 24: Tracing idiomaticity in learner language: the case of BE Przemysław Kaszubski School of English Adam Mickiewicz University Poznań, Poland CL2001 Lancaster

CL2001, Lancaster University

This show shortly available from:

http://main.amu.edu.pl/~przemka/rsearch.html