View
223
Download
0
Tags:
Embed Size (px)
Citation preview
Making useful wordlists for ELTTopical vocabulary from the WWW
Simon Smith & Scott SommersMing Chuan University, Taipei
Adam Kilgarriff, Lexical Computing Ltd, UK
Generous support from National Science Council, Taiwan
Outline
• Importance of learning natural English• Wordlists in English learning• Making relevant wordlists• Using two corpus analysis tools
– WebBootCat – Sketch Engine
• Conclusions and future plans
The problem
• Learning non-authentic English– It’s raining cats and dogs!– Long time no see!
• In Taiwan, all students learn these• They may believe they are authentic• But English speakers hardly use them!
Word and phrase lists
• Students must learn vocabulary• It is best to learn vocabulary through practice:
– Reading– Speaking to American people– Interacting in the language
• That is difficult for Asian students• In Taiwan, students must learn vocabulary
from lists
From the MOE
• 6000 word high school list– Probably useful for
policy makers– May be useful for
teachers– Not useful for learners
• Better to organize wordlists by topic?
So, we should teach vocabulary by topic?
Khmer learning Game © North Illinois University
Unit 1
Getting started at University
Nounsattendance course facilities helmetinitiative major vendor Verbsaccomplish consider improve tease Adjectiveschallenging fortunateimpatient occasional protective
From the ELC textbook
• It is not easy to make up a good vocabulary list for an abstract topic
• Try these topics:– Unit 1: Getting started at University– Unit 2: Family and Hometown– Unit 3: English and You
• Please– Choose a topic– Write down some good keywords
• Better use computer to help us!
Getting wordlists from the web
WebBootCat: making corpora from the web
• User chooses some seed words– For example freshman and university
• WebBootCat – searches Yahoo for seed words– throws away lists of numbers, HTML, prices lists…– puts all running text into a corpus– tags the corpus (noun, verb etc) if required
12345 56789 $$$$$ £££££*&%^
WebBootCat passes query to Yahoo!
WebBootCat throws away non-data web pages
WebBootCat puts text pages in corpus
User enters seed words
Advantages of automatic wordlist creation
• contain relevant, topical vocabulary
• created easily and conveniently
• of course, we can select the words manually, from the automatic list!
Disadvantages of manual wordlist creation
• It is difficult to get inspiration to make good wordlists manually.
• Manual wordlists may include rare or unnecessary vocabulary.
Future work: Automatic cloze exercise generation
Q: It’s a ___ day today!
(b) tepid(a) toasty Choose:
(c) lukewarm
(d) sunny
Summary: making wordlists
• choose a topic • get a topic corpus from the web• extract topic wordlist from it• Use recursive bootstrapping to extend the
wordlist• include multi-word terms in the wordlist