Upload
clinton-bradley
View
217
Download
0
Tags:
Embed Size (px)
Citation preview
Syntagmatic Relations in Corpus and Learner Lexicography
Jelena Kallas Tallinn University, Institute of the Estonian Language
syntagmatic relations ‒ relations that linguistic units (e.g. words, clauses) have with other units because they may occur together in a sequence (Richards, Schmidt 2002: 534)
syntagmatic information in a dictionary ‒ behaviour of the lemma in combination with other words, both grammatically and lexically (Svensén 2009: 30)
syntagmatic dictionaries: construction or valency dictionaries, collocation dictionaries and idiom dictionaries (Sversen 2009: 30)
lexis versus grammar
↓
towards a lexico-grammar
Towards a lexico-grammar (1)
Pattern (Hunston, Francis 1999), construction (Atkins, Rundell 2008), collocation (Barch 2004, Siepmann 2005)
Pattern ‒ all the words and structures which are regularly associated with the word and which contribute to its meaning. A pattern can be identified if a combination of words occurs relatively frequently, if it is dependent on a particular word choice, and there is a clear meaning associated with it (Hunston, Francis 1999:32)
Collocation ‒ holistic lexical, lexico-grammatical or semantic unit normally composed of two or more words which exhibits minimal recurrence within a particular discourse community (Siepmann 2005:438)
Towards a lexico-grammar (2)Syntagmatic relations of Estonian substantives, adjectives, adverbs and verbs are • identified on the basis of Estonian language traditional
grammar description;• described as lexico-grammatical patters defined be means of
categorical (mostly part of speech) and functional-relational (subject, object, adverbial) labels
e.g. for noun AJP +N (ilus naine), AVP+N (raagus puud), N+PP (hirm vanemate ees) etc.
PURPOSE → automatic extraction
Tasks of corpus lexicography(Rundell, Kilgarriff 2011)
analysis of the corpus: to discover word senses and other lexical units (fixed phrases,
phrasal verbs, compounds, etc.) to identify the salient features of each of these lexical units
(1) their syntactic behaviour
(2) the collocations they participate in
(3) their colligational preferences
(4) any preferences they have for particular text-type or domains
exemplifying relevant features with material gleaned from the corpus
Corpus tools for this task(Kilgarriff, Kozem 2013)
Computer-based tools (WordSmith Tools, MonoConc Pro, IMC Corpus Workbench, Antconc) vs. online tools (Sketch Engine, Korpus DK)
Corpus related tools (XAIRA, Korpus DK) vs. corpus-independent tools
Prepared corpus vs. Web as corpus (WebCorp) Simple tools (concordancer, collocation, keywords) vs. advanced tools
(word sketches, CQL searches, GDEX)
Resourses for Estonian: Keeleveeb, Kollokatsioonide leidja, Sketch Engine (Estonian Reference Corpus, ca 250mln, tagged for sentences, clauses, and morphology (POS-tag and inflections) by FILOSOFT Ltd.)
Word Sketches for Estonian in SkE (1)
60 rules relations, which correspond to POS-tag and morphological inflections
(subject, object, adverbials, modifiers) oblique objects of noun prepositional phrases oblique objects and adverbials of particle verbs oblique objects of prepositional verbs constructions with conjunctions ja/või ‘and/or’, kui/nagu ‘as’ predicative (complements of the copula-like verb olema ‘be’) various combinations of finite verbs with non-finite verbsmulti-word verbs
Word Sketches for Estonian in SkE (2)
Substantiivifraasi laiendid (NP co-constituents) a) adjektiiv(ifraas), nt ilus naine, tugev mees; b) substantiiv(fraas), nt venna raamat, nokaga müts; c) kaassõnafraas, nt uhkus kodumaa üle; d) infinitiiv(ifraas) soov õppida; e) kvantorifraas, nt sada kilomeetrit, meeter riiet; f) adverb(ifraas), nt raagus puud; g) kõrvallause, nt Muidugi jääb küsimus, kas see isik on sotsiaalselt kindlustatud.
Lemma diskussioon sõnavisand
Word Sketches for Estonian in SkE (3)
Adverbifraasi laiendid (ADVP co-constituents) a) adverb, nt väga hästi; b) substantiivi käändevorm, nt uksest siinpool, teistest paremini; c) kaassõnafraas, nt selja pealt katki;c) kvantorifraas, nt paar päeva hiljem, mitu kilomeetrit kaugemal; d) kõrvallause, nt Ta rääkis kauem, kui mina seda .Obliikvakäändes substantiiv võib esineda: seestütlevas (otsast katki, äärest lahti; ootusest elevil, ärevusest hingetu), kaasaütlevas (partneriga vaheldumisi, rahadega kimpus), rajavas (milleni täis, pingul, surmani solvunult, maani täis).
Lemma omaette sõnavisand
Multi-word lexical verbs in SkE(väljendverbid, ühendverbid, ahelverbid, tugiverbiühendid)
Syntagmatic relations in learner lexicography
Keywords EXPLICIT, SELF-EXPLANATORY, THEORY-INDEPENDENT, COMPREHENSIVE
How to present? − in coded metalanguage (N+Adj), in uncoded metalanguage (not before noun), live examples, in the definition format, as outside matter
How to choose?
Basic Criteria (Tono 2012)
• esinemissagedus (frequency)
• esilduvus (logdice)• CERF sõnaloend(Certification standard for European Referance Framework, Cambridge University Project „English Vocabulary Profile“ )• esinemine
kooliõpikutes
What is output?
Tono 2012 kollokatsioonisõnaraamatu kasutajaliides
Selection Criteria in Basic Estonian Dictionary(ametlike keeleoskustasemete nõuded, esinemus keeleoskustasemete
sõnavaraloendites, koosesinemissagedus)
compiled for Estonian language learners at the beginner (A2) and lower-intermediate (B1) levels
4500 words (core vocabulary, frequency dictionary and vocabulary profiles of A2 level (Ilves 2008) were used) ═ definition vocabulary ═ the same vocabulary is used for presenting syntagmatic relationships (collocations and government patterns)
government and collocation patterns (in SkE) statistics: raw frequency or salience → RAW
FREQUENCY
Noun discussion patterns according to raw frequency
Noun discussion patterns according to logDice
References (1)• ATKINS, B. T. S., RUNDELL, M. 2008. The Oxford Guide to Practical Lexicography. Oxford: Oxford University
Press.• BARTSCH, S. 2004. Structural and functional properties of collocations in English. A corpus study of lexical
and pragmatic constraints on lexical co-occurrence. Tübingen, Verlag Gunter Narr.• HUNSTON, S., FRANCIS, G. 1999. Pattern Grammar: A corpus-driven approach to the lexical grammar of
English. Amsterdam/Philadelphia: John Benjamins Publishing Company.• ILVES, M. 2008. Algaka keelekasutaja. A2-taseme eesti keele oskus. Tallinn: Eesti Keele Sihtasutus.• KILGARRIFF jt 2004 = Kilgarriff, Adam, Pavel Rychly, Pavel Smrz ja David Tugwell 2004. The Sketch Engine. –
Proceedings Euralex, Lorient, France.• KILGARRIFF, A., KOZEM, I. 2012. Corpus Tools for Lexicographers. – Electronic Lexicography. Oxford: Oxford
Univ Press (ilmumas)• Richards, J. C., Schmidt, R. 2002. Longman Dictionary of Language Teaching and Applied Linguistics. UK:
Pearson Education Limited.• RUNDELL, M., KILGARRIFF, A. 2011. Automating the creation of dictionaries: where will it all end? – A Taste
for Corpora. In honour of Sylviane Granger. Meunier F., De Cock S., Gilquin G. and Paquot M. (eds). Université catholique de Louvain.
• SIEPMANN, D. 2005. Collocation, Colligation and Encoding Dictionaries. Part I: Lexicological Aspects. – International Journal of Lexicography 18, 409-443.
References (2)• SVENSÉN, B. 2009. A Handbook of Lexicography. The Theory and Practice of Dictionary-Making.
Cambridge: Cambridge University Press.• TONO, Y. 2011. Bilingual lexicography in Japan. Videoettekanne konverentsil Electronic Lexicography in the
21st Century New Applications for New Users. Bled, 10-12 November . Internetis aadressil http://videolectures.net/elex2011_bled/.