Terminology, translation, and PRESEMT; word frequency lists and KELLY

  • Published on
    01-Jan-2016

  • View
    20

  • Download
    2

DESCRIPTION

Adam Kilgarriff Lexical Computing Ltd. Terminology, translation, and PRESEMT; word frequency lists and KELLY. PRESEMT. EU FP7 project FP7-ICT-4-248307 2010-1012 P attern Re cognition based S tatistically E nhanced MT Six partners, five countries - PowerPoint PPT Presentation

Transcript

<ul><li><p>Terminology, translation, and PRESEMT; word frequency lists and KELLY*Adam KilgarriffLexical Computing LtdSKEW-2, March 2011Kilgarriff: PRESEMT and KELLY</p><p>Kilgarriff: PRESEMT and KELLY</p></li><li><p>PRESEMTEU FP7 projectFP7-ICT-4-2483072010-1012PatternRecognition basedStatisticallyEnhancedMTSix partners, five countriesLanguages: Czech English German Greek ItalianComparable Corpora BootCat (CCBC)Demo by Jan Pomikalekhttp://www.presemt.eu SKEW-2, March 2011Kilgarriff: PRESEMT and KELLY*</p><p>Kilgarriff: PRESEMT and KELLY</p></li><li><p>KELLYKeywords for Language Learning for Young and adults alikeEU lifelong learning project: Goal: wordcardsWord in one lg on one side, other on otherLanguage learning9 languages, 36 pairsArabic Chinese English Greek Italian Norwegian Polish Russian SwedenPartners in 6 countrieshttp://su.avedas.com/converis/contract/321</p><p>SKEW-2, March 2011Kilgarriff: PRESEMT and KELLY*</p><p>Kilgarriff: PRESEMT and KELLY</p></li><li><p>SKEW-2, March 2011Kilgarriff: PRESEMT and KELLY*</p><p>Kilgarriff: PRESEMT and KELLY</p></li><li><p>MethodPrepare monolingual listsTranslateEach into 8 target languagesProfessional translation servicesIntegrate, finaliseProduce cardsGoal for each set9000 pairs at 6 levelsSKEW-2, March 2011Kilgarriff: PRESEMT and KELLY*</p><p>Kilgarriff: PRESEMT and KELLY</p></li><li><p>StagesSort out corpora, taggingAutomatically generate M1 listsnames, numbers, countries ...keywords vis-a-vis other corporaReview, compare, prepare M2 listsTranslate Use translations: M3 listsFinaliseSKEW-2, March 2011Kilgarriff: PRESEMT and KELLY*</p><p>Kilgarriff: PRESEMT and KELLY</p></li><li><p>review - how?points system 2 points for each of 6 levels12 points for most freq wordsdeduct points for words in over-represented areasadd in words from other corpora SKEW-2, March 2011Kilgarriff: PRESEMT and KELLY*</p><p>Kilgarriff: PRESEMT and KELLY</p></li><li><p>Translation databaseOn the webAll translations entered into itQueries likeAll Swedish words used as translations more than six timesAll 1:1:1:1... 'simple cases'SKEW-2, March 2011Kilgarriff: PRESEMT and KELLY*</p><p>Kilgarriff: PRESEMT and KELLY</p></li><li><p>Using the translations databaseFind words not in M2 lists, that need addingMultiwordsEnglish look forProbably, the translation of a high-freq word in several of the 8 other lgsSo: add it to English listHomonyms: could be similarSKEW-2, March 2011Kilgarriff: PRESEMT and KELLY*</p><p>Kilgarriff: PRESEMT and KELLY</p></li><li><p>Monolingual master lists (M3)Based on a WAC corpusInput from other same-lg corporaAnd from translations from 8 lgsUseful words which might not be hi-freqadded words/multiwords must be above a lower freq thresholdTarget 9000</p><p>SKEW-2, March 2011Kilgarriff: PRESEMT and KELLY*</p><p>Kilgarriff: PRESEMT and KELLY</p></li><li><p>Matches across 9 languagesSet of symmetrical relations across all 36 pairsmusic library sun hospital theorySKEW-2, March 2011Kilgarriff: PRESEMT and KELLY*</p><p>Kilgarriff: PRESEMT and KELLY</p></li><li><p>Big problemsMultiwords (as anticipated)Homonymy (as anticipated)orange banana alphabet elbow, HelloWorse than anticipatedLists from spoken corpora, learner corpora, neededRelation betweenCompetence for communicatingThe corpora at our disposalSKEW-2, March 2011Kilgarriff: PRESEMT and KELLY*</p><p>Kilgarriff: PRESEMT and KELLY</p></li><li><p>SKEW-2, March 2011Kilgarriff: PRESEMT and KELLY*(Monolingual) Word ListsDefine a syllabusWhich words get used inLearning-to-read books (NS children)NNS language learner textbooksDictionariesLanguage testingNS: educational psychologistsNNS: proficiency levels</p><p>Kilgarriff: PRESEMT and KELLY</p></li><li><p>SKEW-2, March 2011Kilgarriff: PRESEMT and KELLY*Should be corpus-basedMost aren'tCorpora are quite newEasy to do betterPeople will use themMaybe also Governments</p><p>Kilgarriff: PRESEMT and KELLY</p></li><li><p>SKEW-2, March 2011Kilgarriff: PRESEMT and KELLY*HowTake your corpusCountVoila </p><p>Kilgarriff: PRESEMT and KELLY</p></li><li><p>SKEW-2, March 2011Kilgarriff: PRESEMT and KELLY*ComplicationsWhat is a wordWords and lemmasGrammatical classesNumbers, names...MultiwordsHomonymy</p><p>All are slightly different issues for each lg</p><p>Kilgarriff: PRESEMT and KELLY</p></li><li><p>SKEW-2, March 2011Kilgarriff: PRESEMT and KELLY*What is a word; delimitersFound between spacesNot for Chinese: segmentationEnglishco-operate, widely-held, farmer's, can'tNorwegian, SwedishCompounding, separable verbsArabic, ItalianClitics, al, ......</p><p>Kilgarriff: PRESEMT and KELLY</p></li><li><p>SKEW-2, March 2011Kilgarriff: PRESEMT and KELLY*Words and lemmasWord form (in text)invadingLemma (dictionary headword)Invade for forms invade invades invaded invadingLemmatisationChinese, none; English, simpleMiddling: Swe Nor It GrTough: Rus, Pol, Ara</p><p>Kilgarriff: PRESEMT and KELLY</p></li><li><p>SKEW-2, March 2011Kilgarriff: PRESEMT and KELLY*Word FamiliesDerivational morphologyefficient/efficientlyaccess/accessible/accessibilityavailable/availability/unavailableWord families traditioneg: Coxhead, Academic word listPedagogy: one item to learn ButWhere do families end? Different meanings</p><p>Kilgarriff: PRESEMT and KELLY</p></li><li><p>SKEW-2, March 2011Kilgarriff: PRESEMT and KELLY*Grammatical classesbrush (verb) and brush (noun)Same item or different? (both in same word family)Required(short) list of word classesPOS-taggerWill make mistakes</p><p>Kilgarriff: PRESEMT and KELLY</p></li><li><p>SKEW-2, March 2011Kilgarriff: PRESEMT and KELLY*Marginal casesNumberstwelve, seventeenth, fiftiesClosed setsDays of week, monthsCountriesCapitals, nationalities, currencies, adjectives, languagesregional/dialects, political groups, religionseaster, christmas, islam, republicanpolicies always needed </p><p>Kilgarriff: PRESEMT and KELLY</p></li><li><p>SKEW-2, March 2011Kilgarriff: PRESEMT and KELLY*MultiwordsAccording toLinguistically a word butMultiword frequency list: top item of theCan't use freqs (alone) to select multiwords </p><p>Kilgarriff: PRESEMT and KELLY</p></li><li><p>SKEW-2, March 2011Kilgarriff: PRESEMT and KELLY*Homonymybank (river) and bank (money)Word sense disambiguationWe can't do (with decent accuracy)We can't give freqs for sensesLists of words not meaningsSometimes disconcerting</p><p>Kilgarriff: PRESEMT and KELLY</p></li><li><p>SKEW-2, March 2011Kilgarriff: PRESEMT and KELLY*CorporaA fairly arbitrary sample of a lgTo limit arbitrariness of wordlistMake it big and diverseWACKY corporaFrom webCan do for any language??? Comparable ???Web language: less formal</p><p>Kilgarriff: PRESEMT and KELLY</p></li><li><p>SKEW-2, March 2011Kilgarriff: PRESEMT and KELLY*Word lists are useful, but...are they scientific?A tiny bit, occasionally...could they be scientific?Yesarticle of faithBy the end of KELLY, we'll have a clearer idea how</p><p>Kilgarriff: PRESEMT and KELLY</p><p>*************************</p></li></ul>

Recommended

View more >