21
Using Corpora in Linguistics Introduction to WordSmith Tools for Beginners Íde O’Sullivan Regional Writing Centre www.ul.ie/rwc

Using Corpora in Linguistics Introduction to WordSmith Tools for Beginners Íde O’Sullivan Regional Writing Centre

  • View
    255

  • Download
    5

Embed Size (px)

Citation preview

Using Corporain Linguistics

Introduction to WordSmith Tools for Beginners

Íde O’SullivanRegional Writing Centre

www.ul.ie/rwc

Regional Writing Centre 2

Corpus Linguistics McEnery and Wilson (2001:1) describe

corpus linguistics as “the study of language based on examples of ‘real life’ language use”.

McEnery, T. and Wilson, A. (2001) (2nd edition) Corpus Linguistics. Edinburgh: Edinburgh University Press.

Regional Writing Centre 3

Corpus: Definition “A corpus is [the name given to] a set of

texts which has been put together for some purpose, usually (though not necessarily), in computer-readable form” (Wray, Trott & Bloomer, 1990:213).

Wray, T., Trott, K. & Bloomer, A. (1998) Projects in Linguistics: A Practical Guide to Researching Language. London, New York: Arnold.

Regional Writing Centre 4

Corpus: Definition

“a corpus typically implies a finite body of text, sampled to be maximally representative of a particular variety of a language, and which can be stored and manipulated using a computer” McEnery and Wilson (2001:73).

Corpus ≠ Archive

Regional Writing Centre 5

Concordancing: Definition “A concordance, in its simplest form, is an

alphabetical listing of the words in a text, given together with the contexts in which they appear”.

Catherine Ball, Concordances & Corpora: Tutorial: http://www.georgetown.edu/faculty/ballc/corpora/tutorial.html

Regional Writing Centre 6

Concordancing: Definition “A concordance is a list of examples of a

particular word, part of a word or combination of words, in its contexts drawn from a text corpus. The search word is sometimes also referred to as a keyword. The most common way of displaying a concordance is by a series of lines h the keyword in context (KWIC)”.Kettemann, B. (1995) “Concordancing in stylistics teaching”, in Grosser, W., Hogg, J. and Hubmeyer, K. (eds), Style: Literary and Non-Literary. Contemporary Trends in Cultural Stylistics. New York: The Edwin Mellen Press: 307-318.

Regional Writing Centre 7

Regional Writing Centre 8

Software to Analyse Corpora “Concordancing software enables you to

discover patterns that exist in natural language by grouping text in such a way that they are clearly visible […] The real value of the concordancer lies in this question of visibility” (Tribble & Jones, 1997:3).

Tribble, C. and Jones, G. (1997) Concordances in the Classroom: Using Corpora in Language Education. Houston TX: Athelstan.

Regional Writing Centre 9

Regional Writing Centre 10

Using Corpora in Language Learning and TeachingOrganisation of the CD This CD contains a collection of small genre-

specific academic and journalistic corpora in English, French, Gaeilge, German and Spanish.

For each language there are two small genre-specific corpora: a journalistic corpus (100,000 words) and an academic corpus (50,000 words). The journalistic corpora are divided into four subcorpora: current affairs, editorials, reviews and sport. The academic corpora are divided into two subcorpora: theses and articles.

Regional Writing Centre 11

Using Corpora in Language Learning and Teaching

Organisation of the CD

T heses25 ,000

A rticles25 ,000

A cadem ic C orpus50 ,000 w ords

paper 1 paper 2

C urren t A ffairs44 ,000

paper 1 paper 2

E dito rial22 ,000

paper 1 paper 2

R ev iew s12 ,000

paper 1 paper 2

S port22 ,000

Journalis tic C orpus100 ,000 w ords

L anguages :E nglish , F rench , G aeilge, G erm an, S panish

Regional Writing Centre 12

Sources of Journalistic Corpora

English: Irish ExaminerIrish IndependentIrish Times

French: Le MondeL’Humanité

Gaeilge:BeoFoinseLá

German: Die Süddeutsche ZeitungDie Frankfurter Allgemeine Zeitung

Spanish: La VanguardiaEl Periódico

Regional Writing Centre 13

Sources of Academic Corpora Articles and thesis written by native speakers

Subject Areas: Literature, Cultural Studies, Translation Studies, Education,Applied Linguistics, Sociolinguistics,Corpus Linguistics, Media Studies,Language Pedagogy, Teacher

Training,Discourse Analysis, Politics,Research Methodology,Second Language Acquisition,History of Language

Regional Writing Centre 14

WordSmith Tools Wordlists

Frequency Alphabetical order Statistical information

Keywords Concord

Collocations Clusters Patterns Plot Source text

Regional Writing Centre 15

WordSmith Tools Concord

Sorting data Concord expansion option Concordance with multiple views Settings Wildcards Advanced searching Close texts

Regional Writing Centre 16

Worksheet Run individual wordlists for the Academic Corpus

and the Journalistic Corpus. Compare and contrast your findings to reach relative conclusions about each genre.

Run a concordance lists for a chosen aspect of the language: Do any collocational patterns emerge from

this evidence? What are the most common clusters including

the search word(s). Identify the most common uses of the word. Are their exceptions to these uses?

Regional Writing Centre 17

Resources

WordSmith Tools: http://www.lexically.net/wordsmith/

MonoConc and ParaConchttp://www.athel.com/mono.html

Regional Writing Centre 18

Online Resources Tim Johns Data-driven Learning Page:

http://www.eisu.bham.ac.uk/johnstf/timconc.htm

Mike Barlow: http://www.athel.com/corpus.html

Other resources: http://www.ul.ie/~appliedlanguages/LI4113_C&C_websites.doc

Regional Writing Centre 19

Online Concordancing Hong Kong Virtual Language Centre

http://www.edict.com.hk/concordance/default.htm

The Compleat Lexical Tutor (Lextutor)http://www.lextutor.ca/

French Learner Language Oral Corpus (flloc) http://www.flloc.soton.ac.uk/

Regional Writing Centre 20

Resources

Freeware Concordancers ConcApp:

http://www.edict.com.hk/pub/concapp/ Create your own corpus - Disposable

corpus Issues of copyright Issue of reliability

Regional Writing Centre 21

Resources

British National Corpus (corpus demo) http://info.ox.ac.uk/bnc/

Cobuild Bank of English (wordbanks online) http://www.cobuild.collins.co.uk/

Corpus Concordance Sampler

http://www.collins.co.uk/Corpus/CorpusSearch.aspx

Limerick Corpus of Irish-English (L-CIE): http://www.ul.ie/~lcie/