Corpora and its use in elt

Preview:

DESCRIPTION

 

Citation preview

CORPORA AND ITS USE IN ELT

What is a corpus?

Corpus, plural corpora is a collection of linguistic data, either compiled as written texts or as a transcription of recorded speech.

“Any body of text” that is, any collection of recorded instances of spoken or written language.

Corpus linguistics adherents believe that reliable language analysis occurs on field-collected samples, in natural contexts and with minimal experimental interference.

A landmark in modern corpus linguistics was the publication by Henry Kucera and W. Nelson Francis of Computational Analysis of Present-Day American English in 1967, a work based on the analysis of the Brown Corpus

Henry Kučera (15 February 1925 – 20 February 2010), born Jindřich Kučera, was a Czech linguist who was a pioneer in corpus linguistics and linguistic software.

John McHardy Sinclair (June 14, 1933 – March 13, 2007), Professor of Modern English Language at Birmingham University, 1965 to 2000. He pioneered work in corpus linguistics, discourse analysis, lexicography, and language teaching.

John Sinclair was a first-generation modern corpus linguist and the founder of the COBUILD project.

Types of corpora

Monolingual

Curpus

Written

Spoken

General

Specialized

Multilingual

Parallel Corpus

A corpora can be composed by texts in a single language or texts in more than one language. If the texts are in the same language such in translations, the corpora is called Parallel Corpus. In this kind of corpora the direction of the translation is not relevant.

Comparable Corpus

The goal of this type of corpora is to compare the languages or varieties presented in similar circumstances of communication.

Sublanguage Corpora

This Corpora include texts from a particular dialect, or variety of a language.

The General Corpora

Is formed by general texts that do not belong to single field, or register.

Recommended