20
Syntagmatic Relations in Corpus and Learner Lexicography Jelena Kallas Tallinn University, Institute of the Estonian Language [email protected]

Syntagmatic Relations in Corpus and Learner Lexicography Jelena Kallas Tallinn University, Institute of the Estonian Language [email protected]

Embed Size (px)

Citation preview

Page 1: Syntagmatic Relations in Corpus and Learner Lexicography Jelena Kallas Tallinn University, Institute of the Estonian Language jelena.kallas@eki.ee

Syntagmatic Relations in Corpus and Learner Lexicography

Jelena Kallas Tallinn University, Institute of the Estonian Language

[email protected]

Page 2: Syntagmatic Relations in Corpus and Learner Lexicography Jelena Kallas Tallinn University, Institute of the Estonian Language jelena.kallas@eki.ee

syntagmatic relations ‒ relations that linguistic units (e.g. words, clauses) have with other units because they may occur together in a sequence (Richards, Schmidt 2002: 534)

syntagmatic information in a dictionary ‒ behaviour of the lemma in combination with other words, both grammatically and lexically (Svensén 2009: 30)

syntagmatic dictionaries: construction or valency dictionaries, collocation dictionaries and idiom dictionaries (Sversen 2009: 30)

lexis versus grammar

towards a lexico-grammar

Page 3: Syntagmatic Relations in Corpus and Learner Lexicography Jelena Kallas Tallinn University, Institute of the Estonian Language jelena.kallas@eki.ee

Towards a lexico-grammar (1)

Pattern (Hunston, Francis 1999), construction (Atkins, Rundell 2008), collocation (Barch 2004, Siepmann 2005)

Pattern ‒ all the words and structures which are regularly associated with the word and which contribute to its meaning. A pattern can be identified if a combination of words occurs relatively frequently, if it is dependent on a particular word choice, and there is a clear meaning associated with it (Hunston, Francis 1999:32)

Collocation ‒ holistic lexical, lexico-grammatical or semantic unit normally composed of two or more words which exhibits minimal recurrence within a particular discourse community (Siepmann 2005:438)

Page 4: Syntagmatic Relations in Corpus and Learner Lexicography Jelena Kallas Tallinn University, Institute of the Estonian Language jelena.kallas@eki.ee

Towards a lexico-grammar (2)Syntagmatic relations of Estonian substantives, adjectives, adverbs and verbs are • identified on the basis of Estonian language traditional

grammar description;• described as lexico-grammatical patters defined be means of

categorical (mostly part of speech) and functional-relational (subject, object, adverbial) labels

e.g. for noun AJP +N (ilus naine), AVP+N (raagus puud), N+PP (hirm vanemate ees) etc.

PURPOSE → automatic extraction

Page 5: Syntagmatic Relations in Corpus and Learner Lexicography Jelena Kallas Tallinn University, Institute of the Estonian Language jelena.kallas@eki.ee

Tasks of corpus lexicography(Rundell, Kilgarriff 2011)

analysis of the corpus: to discover word senses and other lexical units (fixed phrases,

phrasal verbs, compounds, etc.) to identify the salient features of each of these lexical units

(1) their syntactic behaviour

(2) the collocations they participate in

(3) their colligational preferences

(4) any preferences they have for particular text-type or domains

exemplifying relevant features with material gleaned from the corpus

Page 6: Syntagmatic Relations in Corpus and Learner Lexicography Jelena Kallas Tallinn University, Institute of the Estonian Language jelena.kallas@eki.ee

Corpus tools for this task(Kilgarriff, Kozem 2013)

Computer-based tools (WordSmith Tools, MonoConc Pro, IMC Corpus Workbench, Antconc) vs. online tools (Sketch Engine, Korpus DK)

Corpus related tools (XAIRA, Korpus DK) vs. corpus-independent tools

Prepared corpus vs. Web as corpus (WebCorp) Simple tools (concordancer, collocation, keywords) vs. advanced tools

(word sketches, CQL searches, GDEX)

Resourses for Estonian: Keeleveeb, Kollokatsioonide leidja, Sketch Engine (Estonian Reference Corpus, ca 250mln, tagged for sentences, clauses, and morphology (POS-tag and inflections) by FILOSOFT Ltd.)

Page 7: Syntagmatic Relations in Corpus and Learner Lexicography Jelena Kallas Tallinn University, Institute of the Estonian Language jelena.kallas@eki.ee

Word Sketches for Estonian in SkE (1)

60 rules relations, which correspond to POS-tag and morphological inflections

(subject, object, adverbials, modifiers) oblique objects of noun prepositional phrases oblique objects and adverbials of particle verbs oblique objects of prepositional verbs constructions with conjunctions ja/või ‘and/or’, kui/nagu ‘as’ predicative (complements of the copula-like verb olema ‘be’) various combinations of finite verbs with non-finite verbsmulti-word verbs

Page 8: Syntagmatic Relations in Corpus and Learner Lexicography Jelena Kallas Tallinn University, Institute of the Estonian Language jelena.kallas@eki.ee

Word Sketches for Estonian in SkE (2)

Substantiivifraasi laiendid (NP co-constituents) a) adjektiiv(ifraas), nt ilus naine, tugev mees; b) substantiiv(fraas), nt venna raamat, nokaga müts; c) kaassõnafraas, nt uhkus kodumaa üle; d) infinitiiv(ifraas) soov õppida; e) kvantorifraas, nt sada kilomeetrit, meeter riiet; f) adverb(ifraas), nt raagus puud; g) kõrvallause, nt Muidugi jääb küsimus, kas see isik on sotsiaalselt kindlustatud.

Page 9: Syntagmatic Relations in Corpus and Learner Lexicography Jelena Kallas Tallinn University, Institute of the Estonian Language jelena.kallas@eki.ee

Lemma diskussioon sõnavisand

Page 10: Syntagmatic Relations in Corpus and Learner Lexicography Jelena Kallas Tallinn University, Institute of the Estonian Language jelena.kallas@eki.ee

Word Sketches for Estonian in SkE (3)

Adverbifraasi laiendid (ADVP co-constituents) a) adverb, nt väga hästi; b) substantiivi käändevorm, nt uksest siinpool, teistest paremini; c) kaassõnafraas, nt selja pealt katki;c) kvantorifraas, nt paar päeva hiljem, mitu kilomeetrit kaugemal; d) kõrvallause, nt Ta rääkis kauem, kui mina seda .Obliikvakäändes substantiiv võib esineda: seestütlevas (otsast katki, äärest lahti; ootusest elevil, ärevusest hingetu), kaasaütlevas (partneriga vaheldumisi, rahadega kimpus), rajavas (milleni täis, pingul, surmani solvunult, maani täis).

Page 11: Syntagmatic Relations in Corpus and Learner Lexicography Jelena Kallas Tallinn University, Institute of the Estonian Language jelena.kallas@eki.ee

Lemma omaette sõnavisand

Page 12: Syntagmatic Relations in Corpus and Learner Lexicography Jelena Kallas Tallinn University, Institute of the Estonian Language jelena.kallas@eki.ee

Multi-word lexical verbs in SkE(väljendverbid, ühendverbid, ahelverbid, tugiverbiühendid)

Page 13: Syntagmatic Relations in Corpus and Learner Lexicography Jelena Kallas Tallinn University, Institute of the Estonian Language jelena.kallas@eki.ee

Syntagmatic relations in learner lexicography

Keywords EXPLICIT, SELF-EXPLANATORY, THEORY-INDEPENDENT, COMPREHENSIVE

How to present? − in coded metalanguage (N+Adj), in uncoded metalanguage (not before noun), live examples, in the definition format, as outside matter

How to choose?

Page 14: Syntagmatic Relations in Corpus and Learner Lexicography Jelena Kallas Tallinn University, Institute of the Estonian Language jelena.kallas@eki.ee

Basic Criteria (Tono 2012)

• esinemissagedus (frequency)

• esilduvus (logdice)• CERF sõnaloend(Certification standard for European Referance Framework, Cambridge University Project „English Vocabulary Profile“ )• esinemine

kooliõpikutes

Page 15: Syntagmatic Relations in Corpus and Learner Lexicography Jelena Kallas Tallinn University, Institute of the Estonian Language jelena.kallas@eki.ee

What is output?

Tono 2012 kollokatsioonisõnaraamatu kasutajaliides

Page 16: Syntagmatic Relations in Corpus and Learner Lexicography Jelena Kallas Tallinn University, Institute of the Estonian Language jelena.kallas@eki.ee

Selection Criteria in Basic Estonian Dictionary(ametlike keeleoskustasemete nõuded, esinemus keeleoskustasemete

sõnavaraloendites, koosesinemissagedus)

compiled for Estonian language learners at the beginner (A2) and lower-intermediate (B1) levels

4500 words (core vocabulary, frequency dictionary and vocabulary profiles of A2 level (Ilves 2008) were used) ═ definition vocabulary ═ the same vocabulary is used for presenting syntagmatic relationships (collocations and government patterns)

government and collocation patterns (in SkE) statistics: raw frequency or salience → RAW

FREQUENCY

Page 17: Syntagmatic Relations in Corpus and Learner Lexicography Jelena Kallas Tallinn University, Institute of the Estonian Language jelena.kallas@eki.ee

Noun discussion patterns according to raw frequency

Page 18: Syntagmatic Relations in Corpus and Learner Lexicography Jelena Kallas Tallinn University, Institute of the Estonian Language jelena.kallas@eki.ee

Noun discussion patterns according to logDice

Page 19: Syntagmatic Relations in Corpus and Learner Lexicography Jelena Kallas Tallinn University, Institute of the Estonian Language jelena.kallas@eki.ee

References (1)• ATKINS, B. T. S., RUNDELL, M. 2008. The Oxford Guide to Practical Lexicography. Oxford: Oxford University

Press.• BARTSCH, S. 2004. Structural and functional properties of collocations in English. A corpus study of lexical

and pragmatic constraints on lexical co-occurrence. Tübingen, Verlag Gunter Narr.• HUNSTON, S., FRANCIS, G. 1999. Pattern Grammar: A corpus-driven approach to the lexical grammar of

English. Amsterdam/Philadelphia: John Benjamins Publishing Company.• ILVES, M. 2008. Algaka keelekasutaja. A2-taseme eesti keele oskus. Tallinn: Eesti Keele Sihtasutus.• KILGARRIFF jt 2004 = Kilgarriff, Adam, Pavel Rychly, Pavel Smrz ja David Tugwell 2004. The Sketch Engine. –

Proceedings Euralex, Lorient, France.• KILGARRIFF, A., KOZEM, I. 2012. Corpus Tools for Lexicographers. – Electronic Lexicography. Oxford: Oxford

Univ Press (ilmumas)• Richards, J. C., Schmidt, R. 2002. Longman Dictionary of Language Teaching and Applied Linguistics. UK:

Pearson Education Limited.• RUNDELL, M., KILGARRIFF, A. 2011. Automating the creation of dictionaries: where will it all end? – A Taste

for Corpora. In honour of Sylviane Granger. Meunier F., De Cock S., Gilquin G. and Paquot M. (eds). Université catholique de Louvain.

• SIEPMANN, D. 2005. Collocation, Colligation and Encoding Dictionaries. Part I: Lexicological Aspects. – International Journal of Lexicography 18, 409-443.

Page 20: Syntagmatic Relations in Corpus and Learner Lexicography Jelena Kallas Tallinn University, Institute of the Estonian Language jelena.kallas@eki.ee

References (2)• SVENSÉN, B. 2009. A Handbook of Lexicography. The Theory and Practice of Dictionary-Making.

Cambridge: Cambridge University Press.• TONO, Y. 2011. Bilingual lexicography in Japan. Videoettekanne konverentsil Electronic Lexicography in the

21st Century New Applications for New Users. Bled, 10-12 November . Internetis aadressil http://videolectures.net/elex2011_bled/.