26
The Cambridge Learner Corpus, English Profile, the Sketch Engine and the Kelly Project Adam Kilgarriff Lexical Computing Ltd http:// www.sketchengine.co.uk

The Cambridge Learner Corpus, English Profile, the Sketch Engine and the Kelly Project Adam Kilgarriff Lexical Computing Ltd

Embed Size (px)

Citation preview

Page 1: The Cambridge Learner Corpus, English Profile, the Sketch Engine and the Kelly Project Adam Kilgarriff Lexical Computing Ltd

The Cambridge Learner Corpus, English Profile, the Sketch Engine

and the Kelly Project

Adam KilgarriffLexical Computing Ltd

http://www.sketchengine.co.uk

Page 2: The Cambridge Learner Corpus, English Profile, the Sketch Engine and the Kelly Project Adam Kilgarriff Lexical Computing Ltd

The Cambridge Learner Corpus, English Profile, the Sketch Engine,

“freely available”, HOO, DANTE and the Kelly Project

Adam KilgarriffLexical Computing Ltd

http://www.sketchengine.co.uk

Page 3: The Cambridge Learner Corpus, English Profile, the Sketch Engine and the Kelly Project Adam Kilgarriff Lexical Computing Ltd

Cambridge Learner Corpus (CLC)

• Since 1993 – Nearly as old as CECL

• Leading resource (like ICLE)• CUP and Cambridge ESOL– For better dictionaries, ELT courses, tests– Material: all from exams (levels A1-C2)

• 45m words; 22m error-tagged• 200,000 scripts, 138 L1s, 203 nationalities

Page 4: The Cambridge Learner Corpus, English Profile, the Sketch Engine and the Kelly Project Adam Kilgarriff Lexical Computing Ltd

English Profile

• From 2006• Cambridge Univ, Univ Press, ESOL (+ others)• Goal– for each CEFR level, find characteristic lexis and

grammar– Main resource: CLC– Talk on Thursday• Theodora Alexopolou, Helen Yannakoudakis

Page 5: The Cambridge Learner Corpus, English Profile, the Sketch Engine and the Kelly Project Adam Kilgarriff Lexical Computing Ltd

Flyers

Page 6: The Cambridge Learner Corpus, English Profile, the Sketch Engine and the Kelly Project Adam Kilgarriff Lexical Computing Ltd

Sketch Engine

• Leading corpus tool• Word sketches– One-page summaries of a word’s grammatical and

collocational behaviour• In use at OUP, CUP, Collins, Macmillan, INL …• 42 languages– Over 150 corpora– Since May including CHILDES: demo– Since last year including CLC

Page 7: The Cambridge Learner Corpus, English Profile, the Sketch Engine and the Kelly Project Adam Kilgarriff Lexical Computing Ltd

Error-coded corpus

• Challenge– Intuitive to search for x• anywhere• only where it is part of an error• only where it is part of a correction

where x can be a word, phrase, grammar pattern …

Requirement for CLC in Sketch Engine

Page 8: The Cambridge Learner Corpus, English Profile, the Sketch Engine and the Kelly Project Adam Kilgarriff Lexical Computing Ltd

Sample text

• We will only use those informations to take part of our guest survey

Page 9: The Cambridge Learner Corpus, English Profile, the Sketch Engine and the Kelly Project Adam Kilgarriff Lexical Computing Ltd

Error-coded corpora in SkE

• demo

Page 10: The Cambridge Learner Corpus, English Profile, the Sketch Engine and the Kelly Project Adam Kilgarriff Lexical Computing Ltd

freely available

Page 11: The Cambridge Learner Corpus, English Profile, the Sketch Engine and the Kelly Project Adam Kilgarriff Lexical Computing Ltd

freely available

Free (MED online)Sense 1: not costing anythingSense 4: not limited by rules … anyone can get hold of it??

Page 12: The Cambridge Learner Corpus, English Profile, the Sketch Engine and the Kelly Project Adam Kilgarriff Lexical Computing Ltd

freely available

Free (MED online)Sense 1: not costing anythingSense 4: not limited by rules … anyone can get hold of it??

AvailableTo download onto your comTo use

Page 13: The Cambridge Learner Corpus, English Profile, the Sketch Engine and the Kelly Project Adam Kilgarriff Lexical Computing Ltd

Case studiesICLE CLC

Money 225 EUR No

To everyone Yes Cambridge author/collab

To download ? No

To use Yes Yes

Page 14: The Cambridge Learner Corpus, English Profile, the Sketch Engine and the Kelly Project Adam Kilgarriff Lexical Computing Ltd

Non-geeks

• Access is important, not download• Web is beautiful

Page 15: The Cambridge Learner Corpus, English Profile, the Sketch Engine and the Kelly Project Adam Kilgarriff Lexical Computing Ltd

HOO / HOO+

• Helping Our Own• HOO: English-NNS NLP researchers – Developer = user: motivation– Shared task/competitive evaluation• Organisers define task and prepare ‘gold standard’• Teams participate by running their software over test

data• Six teams (incl Tübingen), workshop end Sept

Page 16: The Cambridge Learner Corpus, English Profile, the Sketch Engine and the Kelly Project Adam Kilgarriff Lexical Computing Ltd

HOO+ (2012)

• Probably– English: learner data from CLC– Other languages? – Tasks• Essay scoring • Determiner, preposition errors• ?• http://www.clt.mq.edu.au/research/projects/hoo/

Page 17: The Cambridge Learner Corpus, English Profile, the Sketch Engine and the Kelly Project Adam Kilgarriff Lexical Computing Ltd

DANTE

Highlights of English lexicography

Page 18: The Cambridge Learner Corpus, English Profile, the Sketch Engine and the Kelly Project Adam Kilgarriff Lexical Computing Ltd

DANTE

Page 19: The Cambridge Learner Corpus, English Profile, the Sketch Engine and the Kelly Project Adam Kilgarriff Lexical Computing Ltd

DANTE

Page 20: The Cambridge Learner Corpus, English Profile, the Sketch Engine and the Kelly Project Adam Kilgarriff Lexical Computing Ltd

DANTE

Page 21: The Cambridge Learner Corpus, English Profile, the Sketch Engine and the Kelly Project Adam Kilgarriff Lexical Computing Ltd

DANTE

http://webdante.comFlyers

Page 22: The Cambridge Learner Corpus, English Profile, the Sketch Engine and the Kelly Project Adam Kilgarriff Lexical Computing Ltd

The KELLY Project

• EU Lifelong Learning Project• Word cards– 9 languages

• Arabic Chinese English Greek Italian Norwegian Polish Russian Swedish

– All 36 pairs– Words the learner should know (at A1 … C2)

• Partners• Stockholm Univ, Gotheburg Univ, Adam Mickiewicz Univ,

ILSP Athens, CNR Pisa, Oslo Univ, Leeds Univ, Keewords A/S, Lexical Computing Ltd

Page 23: The Cambridge Learner Corpus, English Profile, the Sketch Engine and the Kelly Project Adam Kilgarriff Lexical Computing Ltd

Interesting question

• How close to purely corpus-based can a pedagogic list be?

Page 24: The Cambridge Learner Corpus, English Profile, the Sketch Engine and the Kelly Project Adam Kilgarriff Lexical Computing Ltd

Method

• Take a general corpus• Count• Review, add, delete using other lists and corpora• Translate (72 directed-lg-pairs)• Words not in source list which occur in

translations:– Review source list

• http://kelly.sketchengine.co.uk

Page 25: The Cambridge Learner Corpus, English Profile, the Sketch Engine and the Kelly Project Adam Kilgarriff Lexical Computing Ltd

• Symmatrical pairs: <x,y> and <y,x>• Cliques:– For x, y, z, … all pairs are symmetrical– 9-language cliques (English members)• hospital library music sun theory

Page 26: The Cambridge Learner Corpus, English Profile, the Sketch Engine and the Kelly Project Adam Kilgarriff Lexical Computing Ltd

Homage