48
Co-occurrence and collocation

Co-occurrence and collocation. 1. Definitions Firth (1951) : Modes of Meaning : « Words shall be known by the company they keep ». Collocation designates

Embed Size (px)

Citation preview

Page 1: Co-occurrence and collocation. 1. Definitions Firth (1951) : Modes of Meaning : « Words shall be known by the company they keep ». Collocation designates

Co-occurrence and collocation

Page 2: Co-occurrence and collocation. 1. Definitions Firth (1951) : Modes of Meaning : « Words shall be known by the company they keep ». Collocation designates

1. Definitions

Firth (1951) : Modes of Meaning :« Words shall be known by the company they keep ».

Collocation designates both a process or a state (the act of collocation or the state of being collocated) and the result of the process (an arrangement or juxtaposition, especially of linguistic elements, such as words.

Page 3: Co-occurrence and collocation. 1. Definitions Firth (1951) : Modes of Meaning : « Words shall be known by the company they keep ». Collocation designates

Crystal (1991) : A Dictionary of Linguistics and Phonetics : « a habitual co-occurrence of individual lexical items »

Co-occurrence may be fortuitous, whereas collocation reflects collective usage

Collocation is a type of lexical constraint : « the language user has available to him or her a large number of semi-preconstructed phrases that constitute single choices » (Sinclair : 1991)

Page 4: Co-occurrence and collocation. 1. Definitions Firth (1951) : Modes of Meaning : « Words shall be known by the company they keep ». Collocation designates

A collocation is an arbitrary and recurrent word combination (Smadja, 1990)

A contrastive view of collocation, as expressed by F.J. Hausmann (1990) :

« […] l'idiosyncrasie de la collocation ne se révèle définitivement que dans l'optique d'une autre langue qui combine, pour exprimer le même fait, des mots différents »

(idiosyncrasy : a structural or behavioral characteristic peculiar to an individual or a group)

Page 5: Co-occurrence and collocation. 1. Definitions Firth (1951) : Modes of Meaning : « Words shall be known by the company they keep ». Collocation designates

2. Types of collocations

Halliday (1966: 151, 157) argues that the collocational patterns of lexical items can lead to generalizations at the lexical level.

If certain items belong to the same set, then they can be regarded as “a single lexical item”:

Page 6: Co-occurrence and collocation. 1. Definitions Firth (1951) : Modes of Meaning : « Words shall be known by the company they keep ». Collocation designates

A strong argument, he argued strongly, the strength of his argument and his argument was strengthened [can all be regarded] as instances of one and the same syntagmatic relation. What is abstracted is an item strong, having the scatter strong, strongly, strength, strengthened, which collocates with argue (argument).

Page 7: Co-occurrence and collocation. 1. Definitions Firth (1951) : Modes of Meaning : « Words shall be known by the company they keep ». Collocation designates

Sinclair (1991) proposes two principles:

“The grammatical level is represented by the “open-choice principle”, which sees “language text as the result of a very large number of complex choices ... the only restraint [being] grammaticalness”. (cf. Colorless green ideas sleep furiously)

The “idiom principle” represents the lexical level and accounts for “the restraints that are not captured by the open-choice model”.

Page 8: Co-occurrence and collocation. 1. Definitions Firth (1951) : Modes of Meaning : « Words shall be known by the company they keep ». Collocation designates

Three factors that determine the categorization of a lexical combination

the degree of probability that the items will co-occur

the degree of fixity of the combination (i.e. grammatical restrictions)

the degree in which the meaning of the combination can be derived from the meaning of its constituent parts

Page 9: Co-occurrence and collocation. 1. Definitions Firth (1951) : Modes of Meaning : « Words shall be known by the company they keep ». Collocation designates

the terms idiom and collocation (as well as their shadings) are used by different linguists with different definitions.

Most linguists would agree that kick the bucket is an idiom, whereas [make / reach / take] a decision are collocations”.

Page 10: Co-occurrence and collocation. 1. Definitions Firth (1951) : Modes of Meaning : « Words shall be known by the company they keep ». Collocation designates

An example : the verb carry in OH

[person, animal] porter [bag, shopping, load, news, message]

[vehicle, pipe, wire, vein] transporter; [wind, tide, current, stream] emporter;

comporter [warning, guarantee, review, report] supporter [weight, load, traffic] l'emporter dans [state, region, constituency]; remporter

[battle, match] the motion was carried by 20 votes to 13 la motion l'a emporté par 20 votes contre 13

Idioms : to be carried away by sth être emballé[!] par qch; to get carried away[!] s'emballer[!], se laisser emporter.

Page 11: Co-occurrence and collocation. 1. Definitions Firth (1951) : Modes of Meaning : « Words shall be known by the company they keep ». Collocation designates

Woods, E. & McLeod, N. (1990) Using English Grammar, Prentice Hall.

Woods & McLeod suggest the following continuum (from most to least predictable/fixed): – Idioms (do not allow for substitution of their

elements, nor for grammatical or syntactic alterations)

– Collocations (roughly predictable word combinations with some restrictions)

– Colligations (generalisable classes of collocations, for which at least one construct is specified by category rather than as a distinct lexical item)

– Free combinations (compositional and productive)

Page 12: Co-occurrence and collocation. 1. Definitions Firth (1951) : Modes of Meaning : « Words shall be known by the company they keep ». Collocation designates

Oxford Dictionary of Current Idiomatic English, Vol. 2 - English Idioms (1983),

Oxford University Press

presents a continuum from idiom to non-idiom distinguishes between pure idioms (totally fixed) and

figurative idioms (allowing for some variation) - blow a fuse, as a figurative idiom, can only be used in the active form.- the idiomatic sense of blow one's own [horn / trumpet] is not activated in the absence of own.

Collocations (non-idioms) are divided between restricted (or semi-idioms) and open

Page 13: Co-occurrence and collocation. 1. Definitions Firth (1951) : Modes of Meaning : « Words shall be known by the company they keep ». Collocation designates

Restricted collocations allow a degree of lexical variation” (one element has a figurative sense not found outside that limited context whereas the other appears in a familiar, literal sense cf. carry a motion)

In open collocations elements are freely combinable and are used in a common literal sense

Page 14: Co-occurrence and collocation. 1. Definitions Firth (1951) : Modes of Meaning : « Words shall be known by the company they keep ». Collocation designates

Word Combinations (Howarth, 1993 : A PHRASEOLOGICAL

APPROACH TO ACADEMIC WRITING)

functional expressions (1) More haste less speed. (proverb) (2) Unaccustomed as I am to public speaking ..

(speech formula) (3) You name it, we've got it. (slogan) (4) When in Rome. (abbreviated proverb) composite units (5) blow a trumpet (open collocation) (6) blow a fuse (restricted collocation) (7) blow your own trumpet (figurative idiom) (8) blow the gaff (pure idiom) = vendre la mèche

Page 15: Co-occurrence and collocation. 1. Definitions Firth (1951) : Modes of Meaning : « Words shall be known by the company they keep ». Collocation designates

Cruse, D. A. (1986) Lexical Semantics, Cambridge University Press.

distinguishes between idioms (“lexically complex” units, constituting a “single minimal semantic constituent” kick the bucket) and collocations (“sequences of lexical items which habitually co-occur”, each lexical item being a “semantic constituent”).

He also introduces bound collocations (expressions “whose constituents do not like to be separated”) as a “transitional area bordering on idiom”.

Page 16: Co-occurrence and collocation. 1. Definitions Firth (1951) : Modes of Meaning : « Words shall be known by the company they keep ». Collocation designates

Benson, M., Benson, E. & Ilson, R. (1986) Lexicographic Description of English (Studies in Language Companion Series, No 14), John Benjamins Publishing Company.

Benson, M., Benson, E. & Ilson, R. (1986) The BBI Combinatory Dictionary of English, John Benjamins Publishing Company.

Page 17: Co-occurrence and collocation. 1. Definitions Firth (1951) : Modes of Meaning : « Words shall be known by the company they keep ». Collocation designates

B,B & I distinguish between grammatical and lexical collocations.

Grammatical collocations have a node followed by a subordinate unit (which is often a preposition) : refer to, reliance on, proud of

In lexical collocations, both components have equal lexical status (ADJ-N, VB-N, ADV-ADJ, VB-ADV)

Page 18: Co-occurrence and collocation. 1. Definitions Firth (1951) : Modes of Meaning : « Words shall be known by the company they keep ». Collocation designates

Sinclair, J. McH. (1991) Corpus, Concordance, Collocation,

Oxford University Press.

Defines two types of collocations (upward / downward) depending on the relative frequency of the two words considered in the order in which they occur.

« give sb an edge » is a downward collocation, because « give » is more frequently used than « edge ».

Page 19: Co-occurrence and collocation. 1. Definitions Firth (1951) : Modes of Meaning : « Words shall be known by the company they keep ». Collocation designates

Clas (1994) : « Collocations et langues de spécialité » in Meta, XXXIX, 4.

V+N : prononcer un discours (verbe support) N+ADJ : rude épreuve, marque distinctive ADV+ADJ : grièvement blessé VB+ADV : recommander chaudement N+V : la cloche sonne, le chat miaule Marquage de la quantité du nom : un

troupeau de vaches, une pincée de sel

Page 20: Co-occurrence and collocation. 1. Definitions Firth (1951) : Modes of Meaning : « Words shall be known by the company they keep ». Collocation designates

Critique : la première catégorie est trop restrictive : set

a record serait une collocation, mais pas [beat / break / hold] a record

Les deux dernières catégories ne supportent presque pas la variation

Page 21: Co-occurrence and collocation. 1. Definitions Firth (1951) : Modes of Meaning : « Words shall be known by the company they keep ». Collocation designates

Exemple de choix de lexicalisation(http://pie.usna.edu/explore.html)

Out of 55 nouns that co-occur with « emergency » at least 10 times in the BNC, only 14 can be found in both OH and RC :

brake, measure, operation, repair for collocations

case, center, exit, landing, powers, ration, room, service, services, ward for compound nouns.

Page 22: Co-occurrence and collocation. 1. Definitions Firth (1951) : Modes of Meaning : « Words shall be known by the company they keep ». Collocation designates

What makes a collocation worth learning for an EFL learner?

Collocations that involve a verb and its typical object (drive a car, read a book) can usually be inferred.

Some verbs generate an infinite number of collocates (buy a car, buy a book…)

What makes the collocation worth memorizing is the fact that the verb takes on another meaning (buy a story, buy time)

Page 23: Co-occurrence and collocation. 1. Definitions Firth (1951) : Modes of Meaning : « Words shall be known by the company they keep ». Collocation designates

What makes a collocation remarkable (salient) is the fact that one of its components has few collocates (cf. the Tact Z-score formula)

Consequently, it makes more sense for an EFL learner to learn « downward collocations » grouped under the collocate rather than the node (record / beat, break, hold, set)

Page 24: Co-occurrence and collocation. 1. Definitions Firth (1951) : Modes of Meaning : « Words shall be known by the company they keep ». Collocation designates

Mel’čuk’s lexical functions

Lexical functions are the main principle underlying Mel’čuk’s Meaning-Text Theory

They are meant to describe « institutionalized » lexical relations.

Wanner (1996) gives examples of such relations: aircraft and crew, sheep and flock, bachelor and confirmed, mountain and peak, influence and exert, attention and pay.

Page 25: Co-occurrence and collocation. 1. Definitions Firth (1951) : Modes of Meaning : « Words shall be known by the company they keep ». Collocation designates

The list includes both syntagmatically and paradigmatically related pairs of words.

Mel’čuk admits that even tough all L-F covered phrases are collocations, his model does not cover some collocations when the logical relation between their components cannot be readily inferred (as with assurance maladie and assurance vie).

Page 26: Co-occurrence and collocation. 1. Definitions Firth (1951) : Modes of Meaning : « Words shall be known by the company they keep ». Collocation designates

LFs only cover bigrams. Their aim is to cover syntagmatic and paradigmatic

relations between words within a formalized notation system.

The concept is meant to be applied to a wide variety of languages.

Standard LFs include 36 syntagmatic LFs that belong to 4 distinct categories: nominal, adjectival/adverbial, prepositional and verbal.

Page 27: Co-occurrence and collocation. 1. Definitions Firth (1951) : Modes of Meaning : « Words shall be known by the company they keep ». Collocation designates

Nominal LF:28

28. Centr. [Lat. centrum – ‘the center of culmination of’]– Centr(crisis)= the peak– Centr(desert)=the heart

Page 28: Co-occurrence and collocation. 1. Definitions Firth (1951) : Modes of Meaning : « Words shall be known by the company they keep ». Collocation designates

Adjectival/Adverbial LFs

29.

Magn [Lat. magnus – ‘big, great’]– Magn(naked)=stark– Magn(thin)=as a rake

33.

Bon [Lat. bonus – ‘good’]– Bon(aid)=valuable– Bon(proposal)=tempting

Page 29: Co-occurrence and collocation. 1. Definitions Firth (1951) : Modes of Meaning : « Words shall be known by the company they keep ». Collocation designates

Prepositional LFs

35. Locin [being in ‘place’]– Locin(height)=at [a height of…]

36. Locab [moving away from ‘place’]– Locab(height)=from [a height of…]

37. Locad [moving into ‘place’]– Locad(height)=to [a height of…]

Page 30: Co-occurrence and collocation. 1. Definitions Firth (1951) : Modes of Meaning : « Words shall be known by the company they keep ». Collocation designates

Verbal LFs

59. Degrad [Lat. degradare – ‘lower, degrade’]– Degrad(clothes)=wear off– Degrad(house)=become dilapidated– Degrad(temper)=fray

60. Son [Lat. sonare – ‘sound’]– Son(dog)=bark– Son(waterfall)=roar

Page 31: Co-occurrence and collocation. 1. Definitions Firth (1951) : Modes of Meaning : « Words shall be known by the company they keep ». Collocation designates

Dirk Siepmann : Collocation, Colligation and Encoding Dictionaries.

Part I: Lexicological Aspects IJL 2005 18(4):409-443

Linguistic ‘intertextuality’ : the meaning of one text and its constituent elements depends on millions of other texts using similar or identical elements.

Textual meaning is thus created by the interplay of two types of repetition : – (a) collocation (in the largest possible sense,

including colligation and phraseology) – (b) cohesion.

Page 32: Co-occurrence and collocation. 1. Definitions Firth (1951) : Modes of Meaning : « Words shall be known by the company they keep ». Collocation designates

The subject of collocation has been approached from two main angles: – the semantically-based approaches (Benson

1986, Mel’cuk 1998, Hausmann 2003) which assume a particular meaning relationship between the constituents of a collocation

– the frequency-oriented approach (Sinclair 1991)

Page 33: Co-occurrence and collocation. 1. Definitions Firth (1951) : Modes of Meaning : « Words shall be known by the company they keep ». Collocation designates

A few of Siepmann’s opinions

Only the frequency-based approach can provide a heuristic for discovering the entire class of co-occurrences; in a way, it is safe from refutation, but empty.

By contrast, the semantically-based approach is fragmentary – it cannot account for all possible cases.

Page 34: Co-occurrence and collocation. 1. Definitions Firth (1951) : Modes of Meaning : « Words shall be known by the company they keep ». Collocation designates

A purely pragmatic approach relying on the extralinguistic context cannot explain a large number of co-occurrences operating at the level of semantic features.

What is needed is an extension of the semantically-based approach that will take account of strings of regular syntactic composition which form a sense unit with a relatively stable meaning.

Page 35: Co-occurrence and collocation. 1. Definitions Firth (1951) : Modes of Meaning : « Words shall be known by the company they keep ». Collocation designates

‘Lexical bundles’ (Biber et al. 1999) such as je sais que c’est or it's been will not be included among the class of collocations. Although such sequences may perform similar or identical functions across a range of texts, they have no meaning ‘by themselves’.

Page 36: Co-occurrence and collocation. 1. Definitions Firth (1951) : Modes of Meaning : « Words shall be known by the company they keep ». Collocation designates

there are good […] reasons for subsuming under the notion of collocation such colligational patterns as regarde où tu vas, dans les colonnes de (+ name of newspaper or magazine) or si elle est prise à temps (referring to an illness), which have so far been regarded as free sequences of words subject only to general rules of syntax and semantics.

Page 37: Co-occurrence and collocation. 1. Definitions Firth (1951) : Modes of Meaning : « Words shall be known by the company they keep ». Collocation designates

Are collocations always binary ?

It is accepted wisdom among European researchers that collocations are binary units, and this is probably true for the majority of the class (e.g. take a step, launch an appeal).

[…] threatening to this view are irreducible three-element collocations such as the following:

– (2) the car holds the road well – (3) avoir un geste déplacé -> (?)avoir un geste

 recevoir un accueil chaleureux -> (?)recevoir un accueil

Page 38: Co-occurrence and collocation. 1. Definitions Firth (1951) : Modes of Meaning : « Words shall be known by the company they keep ». Collocation designates

hold the road (subject: tyre), tomber à gros flocons (subject: neige), emporter la conviction (subject: argument) or eine Kurve machen (subject: Straße)

[With such collocations] it [is] difficult to identify a standard lexical function (in the sense of Mel’cuk) that can provide a systematic link between the verb and the noun; this is because the entire collocation is semantically dependent on a specific subject.

Page 39: Co-occurrence and collocation. 1. Definitions Firth (1951) : Modes of Meaning : « Words shall be known by the company they keep ». Collocation designates

Directionality

the assumption of directionality (or of a hierarchical relationship between the constituents of the collocation) seems obvious with items such as table + lay / set or money + withdraw

even such textbook examples of collocational theory as célibataire + endurci (‘confirmed bachelor’) may be viewed as bidirectional, since the adjective endurci combines with any noun carrying the semantic feature [+ figé dans son comportement]: criminel, catholique, Parisien

Page 40: Co-occurrence and collocation. 1. Definitions Firth (1951) : Modes of Meaning : « Words shall be known by the company they keep ». Collocation designates

Berry-Rogghe’s Z_score

The Z-score is an indication of the probability that two words will co-occur within a certain span.

P = frq_totale collocant / longueur du texte E = P x longueur du mini-texte Ecart type = SQR (E x (1-P)) Z-score = (frq_mini-texte collocant –E) /

Ecart type

Page 41: Co-occurrence and collocation. 1. Definitions Firth (1951) : Modes of Meaning : « Words shall be known by the company they keep ». Collocation designates
Page 42: Co-occurrence and collocation. 1. Definitions Firth (1951) : Modes of Meaning : « Words shall be known by the company they keep ». Collocation designates

Concordance de « lit » dans l’expression faire le lit de

6286       infectieux semblent faire le lit des localisations

7774    | dont on | sait qu'ils font le lit du cancer. 21884  et cartilagineuses qui feront le lit de l'

arthrose. 21939            qui | vont | faire le lit de l'

arthrose. 27952  |  personnalité peuvent faire le lit de véritables

maladies 21146 détérioration dentaire et fait le lit de l‘ATS 32987 | vieillissement artériel fait le lit de l' ATS 8847 de l' oreillette gauche faisant le lit des troubles

rythmiques ; 17440 organes des sens, | peut faire le lit de délires d'

Page 43: Co-occurrence and collocation. 1. Definitions Firth (1951) : Modes of Meaning : « Words shall be known by the company they keep ». Collocation designates

Z-SCORE des collocants de « lit »   BEFORE 2, AFTER 0. Mini-text: 268. Total Text: 1 965 368.

Collocate Collocate Freq. Type Freq. Z-score

repos 13 300 64.077

au 46 9517 39.336

feront 1 11 25.781

garder           2 47 24.903

le 35 36247 13.646

faire 5 1124 12.384

faisant 1 178 6.263

font 1 244 5.300

fait 2 2185 3.120

Page 44: Co-occurrence and collocation. 1. Definitions Firth (1951) : Modes of Meaning : « Words shall be known by the company they keep ». Collocation designates

The Mutual Information (MI) score

the word post  co-occurs with many words, among which are "the", "office" and"mortem".

f(office) = 5237f(the) = 1019262f(mortem) = 51

(f= overall frequency in the Birmingham Corpus)

Page 45: Co-occurrence and collocation. 1. Definitions Firth (1951) : Modes of Meaning : « Words shall be known by the company they keep ». Collocation designates

Joint frequency for those three words is as follows :

j(the) = 1583j(office) = 297j(mortem) = 51

The relative frequencies can be compared with what would be expected under the null hypothesis

Page 46: Co-occurrence and collocation. 1. Definitions Firth (1951) : Modes of Meaning : « Words shall be known by the company they keep ». Collocation designates

THE NULL HYPOTHESIS

The word post has no effect whatsoever on its lexical environment and the frequencies of words surrounding post will be exactly the same as they would be if post were present or not.

Expected co-occurrence of post is calculated as : (f(post) * span ) * relative_freq(the) (2579 * 8) * (1 / 20) = 20632 / 20 = 1031

Page 47: Co-occurrence and collocation. 1. Definitions Firth (1951) : Modes of Meaning : « Words shall be known by the company they keep ». Collocation designates

The MI Score is the ratio between observed co-occurrence and expected co-occurrence

For post and the, it is log(1583/1031) = 0,17 The expected joint frequency for post and office is :

(f(post) * span ) * relative_freq(office)2579 * 8 * 297/20m = 0,3

The observed joint frequency is 297. Hence the MI score is about log(990)=2,99

For mortem, the MI score is log(51/0,05) = 3

Page 48: Co-occurrence and collocation. 1. Definitions Firth (1951) : Modes of Meaning : « Words shall be known by the company they keep ». Collocation designates

The mutual-information score for a two-word collocation is a base-2 logarithm of the ratio of the combined probabilities of the occurrence of the first word and the occurrence of the second word to the probability of the occurrence of the two-word collocation.

T-scores differ from mutual information scores in being scaled by an estimate of the variance (they tend to correct skewed MI scores that are due to a low number of occurrences).