What can a corpus tell us about grammar

Preview:

Citation preview

What can a corpus tell us about lexis

& What can a corpus tell us about grammer

Sami Ullah 13131502-103 Usama Shoukat13131502-065 Sarmad shakoor 13131502-058

What can a corpus tell us about lexis

Lexis and the lexicon

Lexis can be searched simply through exploring individual lexical items and their behaviour.

Or By using corpus data to examine the lexicon as a whole or to test lexical theory.

THE GENERAL LEXICON

.What do corpora tell us about the English lexicon?

How many words comprise the main vocabulary of a language, its central lexicon?

“Exact rank orderings of words and rates of occurrence vary

according to corpus composition and lemmatisation policies”

WORD FORMATION

General corpora also provide information about derivation and compounding, helping establish with potential words are actually institutionalised .

For example, Formations such as colo(u)rous , miscolo(u)r are morphologically possible but not found .

Phraseology and phrases

It is better to look at words in their corpus than looking at them in isolation.

COLLOCATION AND PATTERNING What can we learn about a word from looking at its collocates, thewords which it co-occurs?

FIXED EXPRESSIONS AND IDIOMS

Multi-words vs. idioms

Multi-words: Have a wide range

Great varieties of behaviour·

Idioms: Infrequent Appear mainly in journalism & fiction.

Meaning

CONTEXT AND MEANING

Corpus concordances makes us aware of how far the meanings of words are derived from context

Do words have independence meaning at all?

What can corpora tell us about the meanings of words?

POLYSEMY

How many different senses or uses words have?

How these are distinguished in context?

METAPHOR, CONNOTATION & IDEOLOGY

Does corpora inform us about metaphors? How does it do it?

How is connotational meaning analyzed?

How is ideology analyzed?

Sets and synonyms

LEXICAL SETS Is it useful to divide a corpus by fields?

SYNONYMS AND ANTONYMS

How does corpora analyze synonyms?

Antonyms also be explored with corpus data?

Lexis in spoken language

How lexis in spoken interaction is analyzed or explored?

PHRASEOLOGY What types of features can appear when analyzing spoken lexis? What distinctions can be found between speech and writing lexis?

MEANING AND USAGE

Do words and phraseologies fulfil pragmatic functions?

How the meaning has to be explored?

What can a corpus tell us about grammar?

Outline

Understanding grammar through patterns and contexts: moving from correct/incorrect to likely/unlikely

Methodological principles in corpus-based grammar analysis Types of grammatical patterns

Grammar-vocabulary associations (lexico-grammar) Grammar co-text

Discourse-level factors Context of the situation

Investigating multiple features/conditions simultaneously The grammar of speech

Understanding grammar through patterns and contexts. DICHOTOMOUS PERSPECTIVE From this perspective, to describe the grammar of a language, all a

researcher needs is a native speaker because any native speaker can judge grammaticality.

Works well for certain grammatical features. However many grammatical choices cannot be made on the basis of correct/incorrect.

Empirical analyses. The analyses must cover numerous data in order to tell which

language choices are widespread, which occur predictably although under rare circumstances, and which are more idiosyncratic.

Understanding grammar through patterns and contexts CORPUS LINGUISTICS: Increases researchers’ ability to systematically study the variation in

a large collection of texts. Common/uncommon typical/untypical. Show the correspondence between the use of a grammatical feature

and some other factor in the discourse or situational context O’Keeffe et al. (2007) explain, corpus analyses lead us to describing

grammar not just in structural terms, but in probabilistic terms – describing the typical social and discourse circumstances associated with the use of particular grammatical features .

Understanding grammar through patterns and contexts METHODOLOGICAL PRINCIPLES IN CORPUS-BASED GRAMMAR

ANALYSIS Any analysis of ‘ typical ’ or ‘ probable ’ choices depends on

frequency analysis. McCarthy and Carter (2001) explain the need for fine-grained

distinctions in spoken corpora to describe when ellipsis is and is not common.

Biber et al . (2004) explain, we do not regard frequency data as explanatory. In fact we would argue for the opposite: frequency data identifies patterns that must be explained. The usefulness of frequency data (and corpus analysis generally) is that it identifies patterns of use that otherwise often go unnoticed by researchers.

Types of grammatical patterns

GRAMMAR-VOCABULARY ASSOCIATIONS Grammar and lexis were not as distinct as traditionally presented. One type of lexico-grammatical relationship concerns the lexical items that

tend to occur with a particular grammatical structure ( that -clause objects). Biber et al. (1999) find that think , say and know are by far the most

common verbs with that -clauses in both British and American conversation. The structure is less common overall with any verb in academic prose, but suggest and show are most common.

Rather than reporting thoughts and feelings, the verb + that -clause structures in academic prose are used to report previous research.

Another type of lexico-grammatical relationship: verb tense. A grammatical description would explain the form of tenses.

Types of grammatical patterns

GRAMMAR-VOCABULARY ASSOCIATIONS No empirical investigation. A modern corpus-based reference

grammar can provide that information. The verbs most strongly associated with present tense convey

mental, emotional and logical states: E.g.: bet, doubt, know, matter, mean, mind, reckon, suppose, thank The verbs most strongly associated with past tense, convey events or

activities, especially body movements and speech: E.g. : exclaim, eye, glance, grin, nod, pause, remark, reply, shrug,

sigh

Types of grammatical patterns

GRAMMAR-VOCABULARY ASSOCIATIONS Certain structures tend to be associated with certain types of

meaning, such as positive or negative circumstances. This is called ‘ semantic prosody ’ (Sinclair 1991).

O’Keeffe et al. (2007: 106–14) provide an extended corpus-based analysis of get -passives (e.g. he got arrested). They show that the get -passive is usually used to express unfortunate incidences.

O’Keeffe et al. (2007) discuss the type of subjects usually found with get-passives and he lack of adverbials in these clauses.

Types of grammatical patterns

GRAMMATICAL CO-TEXT Grammatical descriptions in traditional textbooks sometimes make

claims about the grammatical co-text of features. Example: Frazier investigation (2003) Investigated: would- clauses of hypothetical or counterfactual

conditionals. e.g.: If water boils, it will turn to steam. Concerned about: the way that ESL grammars virtually always

present the would- clause as an adjacent to an if- clause.

Types of grammatical patterns

GRAMMATICAL CO-TEXT Frazier investigation (2003) Examined: the extent to which this was true in combination of spoken

and written corpora over a million words. Found: almost 80% of the hypothetical would -clauses were not

adjacent to an if -clause. Some of them were part of continuing discourse that had been framed with an if -clause at a lengthy distance from the would -clause.

The co-occurring grammatical features include infinitives and gerunds.

e.g.: Letting the administration … hands would give them.

GRAMMATICAL CO-TEXT

Types of grammatical patterns Frazier's systematic corpus analysis thus highlights two aspects of

grammatical co-text for hypothetical/counter factual would -clauses: The traditional claim that they usually occur with if -clauses is NOT

true. They do often occur with infinitives and gerunds. Looking at grammatical co-occurrence patterns can also help to

explain when rare constructions occur.

GRAMMATICAL CO-TEXT

Types of grammatical patterns Constructions with that - and the fact that - subject position clauses

occur about twenty to forty times per million words in academic prose and newspapers, and almost never in conversations.

that -clauses in other positions occur over 2,000 to 7,000 times per million words in different registers.

These subject position clauses are obviously harder for listeners or readers to process.

DISCOURSE-LEVEL FACTORS

Types of grammatical patterns Because many people ’ s introduction to corpus linguistics is with

simple concordance searches, they sometimes believe that corpus linguistics has little to offer discourse-level studies.

That is not the case because for example, determining the semantic prosody of get -passives required considering discourse context.

When that -clauses are in subject position, they tend to restate information that has already been mentioned in the previous discourse.

DISCOURSE-LEVEL FACTORS

Example: It became surprisingly apparent that all meteorites are of the same

age, somewhere in the vicinity of 4.5 billion years old … That there are no meteorites of any other age, regardless of when they fell to Earth , suggests strongly that… ”

Analysis of discourse-level factors affecting grammar often requires interpreting meaning, organisation, and information structure in texts. Such analysis is part of the more qualitative, interpretative side of corpus study, focusing on how a grammatical structure is used in context.

CONTEXT OF THE SITUATION

Number of factors associated with a grammatical feature: Audience Purpose Participant roles Formality of the situation Most common perspective on variation = registers or genres.

CONTEXT OF THE SITUATION

Cambridge Grammar of English (comparisons between the use of features in speech and writing).

Use of Michigan Corpus of Academic Spoken English . Biber compares grammatical features across ten spoken and written

registers from four American universities. It is misleading to characterise the frequency and use of a

grammatical feature in only one way if focused on general settings. Variables like social class, ethnic group and age have been less

studied in corpus-based grammar research. In the future, new corpus will facilitate more study of the

sociolinguistic variation of grammatical features.

Investigating multiple features/conditions simultaneously Difficult to focus on only one pattern when explaining grammatical

choices. Example: omitting that in a (that -complement clause) Omitting that

is associated with some factors: Lexico-grammatical: the verb in the main clause is say or think . Grammatical co-text: main and complement clause have co-

referential subjects; that -clause has a personal pronoun subject. <Situational context: that is omitted more often in conversation than in newspaper writing.

Investigating multiple features/conditions simultaneously Two approaches: To consider a functional system within a language and describe

factors that influence the grammatical features that are used to realise the system.

To study the grammar of a variety. The focus shifts from describing grammar to describing the variety.

The grammar of speech

Unplanned spoken language was neglected for its 'incomplete' clauses,

messy repairs and non-standard forms. Spoken grammar is studied as a legitimate grammar. The common choice of though rather than however as a contrastive

connector in conversation. [Watching a football game, discussing a penalty call] A: Oh, that's outrageous. B: Well, he did put his foot out though . Speaker B disagrees with A, but the use of though (along with well )

downplays the disagreement.

. The grammar of speech

The grammar of conversation has been thought to be simple compared to writing but there is complexity in conversation at a clausal level.

[The trouble is [[if you're the only one in the house] [he follows you] [and you're looking for him] [so you can't find him.]]] [I thought [I wonder [where the hell he's gone]]] [I mean he was immediately behind me.]] (Biber et al. 1999: 1068) Corpus also provides frequency information as evidence that certain

complex structures are more common in conversation than writing.

Recommended