Upload
cora-broad
View
217
Download
0
Embed Size (px)
Citation preview
The Cambridge Learner Corpus, English Profile, the Sketch Engine
and the Kelly Project
Adam KilgarriffLexical Computing Ltd
http://www.sketchengine.co.uk
The Cambridge Learner Corpus, English Profile, the Sketch Engine,
“freely available”, HOO, DANTE and the Kelly Project
Adam KilgarriffLexical Computing Ltd
http://www.sketchengine.co.uk
Cambridge Learner Corpus (CLC)
• Since 1993 – Nearly as old as CECL
• Leading resource (like ICLE)• CUP and Cambridge ESOL– For better dictionaries, ELT courses, tests– Material: all from exams (levels A1-C2)
• 45m words; 22m error-tagged• 200,000 scripts, 138 L1s, 203 nationalities
English Profile
• From 2006• Cambridge Univ, Univ Press, ESOL (+ others)• Goal– for each CEFR level, find characteristic lexis and
grammar– Main resource: CLC– Talk on Thursday• Theodora Alexopolou, Helen Yannakoudakis
Flyers
Sketch Engine
• Leading corpus tool• Word sketches– One-page summaries of a word’s grammatical and
collocational behaviour• In use at OUP, CUP, Collins, Macmillan, INL …• 42 languages– Over 150 corpora– Since May including CHILDES: demo– Since last year including CLC
Error-coded corpus
• Challenge– Intuitive to search for x• anywhere• only where it is part of an error• only where it is part of a correction
where x can be a word, phrase, grammar pattern …
Requirement for CLC in Sketch Engine
Sample text
• We will only use those informations to take part of our guest survey
Error-coded corpora in SkE
• demo
freely available
freely available
Free (MED online)Sense 1: not costing anythingSense 4: not limited by rules … anyone can get hold of it??
freely available
Free (MED online)Sense 1: not costing anythingSense 4: not limited by rules … anyone can get hold of it??
AvailableTo download onto your comTo use
Case studiesICLE CLC
Money 225 EUR No
To everyone Yes Cambridge author/collab
To download ? No
To use Yes Yes
Non-geeks
• Access is important, not download• Web is beautiful
HOO / HOO+
• Helping Our Own• HOO: English-NNS NLP researchers – Developer = user: motivation– Shared task/competitive evaluation• Organisers define task and prepare ‘gold standard’• Teams participate by running their software over test
data• Six teams (incl Tübingen), workshop end Sept
HOO+ (2012)
• Probably– English: learner data from CLC– Other languages? – Tasks• Essay scoring • Determiner, preposition errors• ?• http://www.clt.mq.edu.au/research/projects/hoo/
DANTE
Highlights of English lexicography
DANTE
DANTE
DANTE
DANTE
http://webdante.comFlyers
The KELLY Project
• EU Lifelong Learning Project• Word cards– 9 languages
• Arabic Chinese English Greek Italian Norwegian Polish Russian Swedish
– All 36 pairs– Words the learner should know (at A1 … C2)
• Partners• Stockholm Univ, Gotheburg Univ, Adam Mickiewicz Univ,
ILSP Athens, CNR Pisa, Oslo Univ, Leeds Univ, Keewords A/S, Lexical Computing Ltd
Interesting question
• How close to purely corpus-based can a pedagogic list be?
Method
• Take a general corpus• Count• Review, add, delete using other lists and corpora• Translate (72 directed-lg-pairs)• Words not in source list which occur in
translations:– Review source list
• http://kelly.sketchengine.co.uk
• Symmatrical pairs: <x,y> and <y,x>• Cliques:– For x, y, z, … all pairs are symmetrical– 9-language cliques (English members)• hospital library music sun theory
Homage