38
© Geoff Barnbrook, Oliver Mason and Ramesh Krishnamurthy 2013 All rights reserved. No reproduction, copy or transmission of this publication may be made without written permission. No portion of this publication may be reproduced, copied or transmitted save with written permission or in accordance with the provisions of the Copyright, Designs and Patents Act 1988, or under the terms of any licence permitting limited copying issued by the Copyright Licensing Agency, Saffron House, 6–10 Kirby Street, London EC1N 8TS. Any person who does any unauthorized act in relation to this publication may be liable to criminal prosecution and civil claims for damages. The authors have asserted their rights to be identified as the authors of this work in accordance with the Copyright, Designs and Patents Act 1988. First published 2013 by PALGRAVE MACMILLAN Palgrave Macmillan in the UK is an imprint of Macmillan Publishers Limited, registered in England, company number 785998, of Houndmills, Basingstoke, Hampshire RG21 6XS. Palgrave Macmillan in the US is a division of St Martin’s Press LLC, 175 Fifth Avenue, New York, NY 10010. Palgrave Macmillan is the global academic imprint of the above companies and has companies and representatives throughout the world. Palgrave® and Macmillan® are registered trademarks in the United States, the United Kingdom, Europe and other countries. ISBN 978–1–403–94612–6 hardback ISBN 978–1–403–94613–3 paperback This book is printed on paper suitable for recycling and made from fully managed and sustained forest sources. Logging, pulping and manufacturing processes are expected to conform to the environmental regulations of the country of origin. A catalogue record for this book is available from the British Library. A catalog record for this book is available from the Library of Congress. Copyrighted material – 978–1–403–94613–3 Copyrighted material – 978–1–403–94613–3

Copyrighted material 978 1 403 94613 3 · v Contents Tables and figures vi Authors’ note ix Part I The Historical Background 1 The concept of collocation 3 2 Collocation and language

Embed Size (px)

Citation preview

© Geoff Barnbrook, Oliver Mason and Ramesh Krishnamurthy 2013

All rights reserved. No reproduction, copy or transmission of thispublication may be made without written permission.

No portion of this publication may be reproduced, copied or transmittedsave with written permission or in accordance with the provisions of the Copyright, Designs and Patents Act 1988, or under the terms of any licence permitting limited copying issued by the Copyright Licensing Agency, Saffron House, 6–10 Kirby Street, London EC1N 8TS.

Any person who does any unauthorized act in relation to this publicationmay be liable to criminal prosecution and civil claims for damages.

The authors have asserted their rights to be identified as the authors of this work in accordance with the Copyright, Designs and Patents Act 1988.

First published 2013 byPALGRAVE MACMILLAN

Palgrave Macmillan in the UK is an imprint of Macmillan Publishers Limited, registered in England, company number 785998, of Houndmills, Basingstoke, Hampshire RG21 6XS.

Palgrave Macmillan in the US is a division of St Martin’s Press LLC, 175 Fifth Avenue, New York, NY 10010.

Palgrave Macmillan is the global academic imprint of the above companies and has companies and representatives throughout the world.

Palgrave® and Macmillan® are registered trademarks in the United States, the United Kingdom, Europe and other countries.

ISBN 978–1–403–94612–6 hardbackISBN 978–1–403–94613–3 paperback

This book is printed on paper suitable for recycling and made from fullymanaged and sustained forest sources. Logging, pulping and manufacturing processes are expected to conform to the environmental regulations of the country of origin.

A catalogue record for this book is available from the British Library.

A catalog record for this book is available from the Library of Congress.

Copyrighted material – 978–1–403–94613–3

Copyrighted material – 978–1–403–94613–3

v

Contents

Tables and figures vi

Authors’ note ix

Part I The Historical Background

1 The concept of collocation 32 Collocation and language theory: the twentieth century 32

Part II Implementation

3 Computing collocations 554 Extensions 80

Part III Applications of Collocation

5 Concordances and lexicography 936 Pedagogy, translation and natural language processing 119

Part IV Implications

7 Collocation and language theory: recent developments 1478 Case studies 174

Appendix 1: Subcorpora of the Bank of English 214

Appendix 2: Case study 3: Concordances – dry, ground and land 215

Appendix 3: Computer programs 226

Bibliography 240

Index of names 250

Subject index 252

Copyrighted material – 978–1–403–94613–3

Copyrighted material – 978–1–403–94613–3

Part IThe Historical Background

Copyrighted material – 978–1–403–94613–3

Copyrighted material – 978–1–403–94613–3

Copyrighted material – 978–1–403–94613–3

Copyrighted material – 978–1–403–94613–3

3

1The concept of collocation

1.1 Introduction

The use of the word collocation has varied a great deal since it was first borrowed into English around the sixteenth century. The history of these changes is covered in detail as part of this chapter, but since there is still considerable variation in its use as a technical linguistic term it might be helpful to establish how the word is used in this book. Generally the word is used in three main ways:

to describe the way in which words group together in their normal use in textsto describe the analysis tool used to explore this grouping and to assess its significance and implications and, more controversially, to describe an aspect of language produc-tion in which pre-fabricated chunks of language are used to build up utterances.

To appreciate these concepts of collocation properly and to understand their importance in modern linguistics, it may be useful to get an overview of the development of the term and the ways in which it has been used.

As a starting-point, we can explore the origins and uses of the word collocation in English lexis through its treatment in dictionaries. The entries for the word in the dictionaries produced during the time that the word has been current in English and in the Oxford English Dictionary (OED) provide some evidence of the use of the word and the meanings attributed to it at each stage. This is supplemented by an examination of the use of the word in other texts over the period since its introduction into English.

Copyrighted material – 978–1–403–94613–3

Copyrighted material – 978–1–403–94613–3

4 Collocation

While the dictionary entries and other text sources provide evidence of the existence and use of the word in English vocabulary over the centu-ries, the practical significance of collocation as a linguistic concept can be more usefully assessed through the use made of the concept of colloca-tion within dictionaries. This establishes the extent to which lexicogra-phers working in different periods have recognised the phenomenon of collocation and the importance (if any) that they have attached to it as a source of information relevant to the words they have documented.

We can get a similar practical demonstration of the use of colloca-tion by considering the development of published concordances – lists of words found in texts showing the environment in which they were used. These were first produced as a form of index to the Bible, and later extended to other texts which were seen as being sufficiently important to justify the production of aids to their analysis. Concordances show a form of practical recognition of the significance of collocation as a tool in the disambiguation of meaning and in the close interpretation of texts through the context of significant words.

Dictionaries and concordances are not the only forms of linguistic guidance that make use of collocation. From the nineteenth century to the present day, style guides have proliferated, providing linguistic advice for those lacking confidence in their use of English, often as a component of guidance on more general matters of etiquette. During the twentieth century large numbers of dictionaries intended for non-native speakers of English were produced. The contents of these diction-aries reveal a shift in the attitude of the lexicographers to collocation, from a reliance on the evidence that it can provide of the meaning of words towards a recognition that it is in itself an important element of the language knowledge that learners need to acquire.

This shift of attitude in turn informed the development of linguistic theory during the twentieth century and led to the eventual identifica-tion of collocation as an underlying principle of language production and interpretation.

All of these strands will be examined in the following sections to give as complete a picture as possible of the development of collocation as a recognised phenomenon of word behaviour, a tool for language analysis and an element of linguistic theory.

1.2 Dictionary entries for the word collocation

The first possible source of evidence for the history of the use of the word collocation can be found in entries for the word in dictionaries.

Copyrighted material – 978–1–403–94613–3

Copyrighted material – 978–1–403–94613–3

The concept of collocation 5

1.2.1 Early dictionaries

Texts dealing specifically with vocabulary first appear in English around the middle of the fifteenth century, with monolingual dictionaries, recog-nisable as forms of the type familiar to us today, appearing at the begin-ning of the seventeenth century. Many of the earlier examples of these dictionaries deal with new words recently borrowed into English, and can provide evidence of the status and meaning of words at this time.

The Lexicons of Early Modern English (LEME) database is a collec-tion of dictionaries and similar texts produced from 1450 to 1702. It is possible to search for occurrences of specific words within LEME, and a search was carried out using the string ‘collocat*’ so as to find any occurrences of collocate, collocates, collocating, collocation etc. The ‘*’ is a special character which is interpreted by the search mechanism as any characters following the search string. The earlier dictionaries con-taining this string, published between 1538 and 1587, were bilingual dictionaries of English and Latin, and contained a variant of the Latin form collocatus. The string was also found – in the form collocation – in four monolingual English dictionaries. These were Bullokar’s An English Expositor, published in 1616, Cockeram’s English Dictionarie, published in 1623, Phillips’ The New World of English Words published in 1658 and Coles’ An English Dictionary published in 1676.Bullokar’s definition is similar to those in the other monolingual dic-tionaries:

Collocation. A placing together.

All four of these texts are so-called ‘hard word’ dictionaries, designed to help users to understand and use words newly borrowed into English, often from Latin. As such, these findings suggest that the word collocation was borrowed into English at some time within the sixteenth century. The factors affecting this borrowing are examined in more detail in section 1.2.4 below.

1.2.2 Eighteenth-century dictionaries

From the beginning of the eighteenth century onwards, most monolingual dictionaries produced in English dealt comprehensively with both hard and simple words. If we look at some of these later texts we find that the word collocation is well established. For example, the 1730 edition of Bailey’s Dictionarium Britannicum has a very similar definition to Bullokar’s:

To COLLOCATE [of collocatum, L.] to place, to set, to appoint to a placeCOLLOCATION. n.s. [collocatio, Latin.]

Copyrighted material – 978–1–403–94613–3

Copyrighted material – 978–1–403–94613–3

6 Collocation

Johnson’s Dictionary, first published in 1755, follows this very closely, but goes into a little more detail:

To COLLOCATE. v.a. [colloco, Latin.]To place; to station.COLLOCATION, a placing or setting in order 1. The act of placing; disposition.2. The state of being placed.

1.2.3 The Oxford English Dictionary

The Oxford English Dictionary (OED), first published between 1884 and 1928, represents the first attempt at a comprehensive historical diction-ary of English and sets out to give a complete account of the life history of words in English. Because of the date of publication of the volume of the OED containing the word collocation the dictionary provides both a general survey of the word’s development in English since its first appearance and its status at the end of the nineteenth century.

According to the OED, the word collocation first appears in English in various forms around the beginning of the sixteenth century. It pro-vides a first quotation from 1513 of the verb form, collocate, and gives as its first two senses:

1. a. trans. To place side by side, or in some relation to each other; to arrange. b. To set in a place or position.

The noun form collocation mirrors these first verb senses, and the quota-tions associated with it in the OED date from 1605 onwards. In these senses the general nature of the action is emphasised, although its fre-quent association with linguistics is also mentioned:

1. a. The action of setting in a place or position, esp. of placing together with, or side by side with, something else; disposition or arrangement with, or in relation to, others; the state of being so placed. Frequently applied to the arrangement of words in a sentence, of sounds, etc.

1.2.4 Why was collocation borrowed into English?

According to the OED, the verb collocate and its associated forms were imported into English from Latin via the participial stem collocat-. The appearance of the words in the sixteenth century suggests that this borrowing is part of the flood of words pouring into English from Latin in response to pressures created, among other things, by the huge amount of translation from classical texts during the period. In some

Copyrighted material – 978–1–403–94613–3

Copyrighted material – 978–1–403–94613–3

The concept of collocation 7

cases words were borrowed because new concepts needed new terms: in others words were borrowed despite the fact that perfectly good terms already existed in English. In the case of collocation, in both the lin-guistic and non-linguistic senses, existing English words would seem to have been perfectly adequate at this time.

For the primary sense of ordering physical items or facts, ‘placing together with or side by side’ in the OED’s words, the word arrange would seem to be a suitable candidate. Adopted originally from French in the fourteenth century, an earlier period of borrowing frenzy, arrange is originally used in its more specific (and more strictly etymological) sense of ‘draw up in ranks or in line of battle’.

In fact, according to the OED this is a rare word until the nineteenth cen-tury. Sense 2a is closest to the meanings already identified for collocate:

2. a. To put (the parts of a thing) into proper or requisite order; to adjust.

The first quotation used to illustrate this sense comes from 1802. In the English definitions for collocation in dictionaries other than

the OED quoted above, the word placing commonly forms part of the phrase. It may be that collocation was borrowed because placing had too general a meaning and was being too widely used, while the meaning of the word arrange was too specific until significantly later. The appear-ance of the word collocation in so many of the early dictionaries suggests that its selection to fill this need was successful from the start.

1.3 Evidence from other texts

In the earlier definitions examined in dictionaries in section 1.2 above, the word collocation has been given a general meaning relating to the arrangement of physical items, although the OED, as already mentioned, refers to its frequent association with the arrangement of words.

In the definition of sense 1b the OED recognises the use of the word ‘quasi-concretely’, or almost as a noun in its own right, rather than an action, and includes as an illustrative quotation the passage from Southey’s The Doctor, published in 1836. A fuller quotation from Southey’s text gives the point of his comment more clearly. Speaking of his character Daniel’s high opinion of the seventeenth-century poet Joshua Sylvester, he claims that:

…Sylvester might have found some compensation for the undeserved neglect into which his works had sunk, by the full and devout delight

Copyrighted material – 978–1–403–94613–3

Copyrighted material – 978–1–403–94613–3

8 Collocation

which his rattling rhymes and quaint collocations afforded to this reader. (Southey 1862, 57)

This use by Southey suggests a notion of collocation which emphasises unusual juxtapositions, carefully chosen as part of a literary technique, and it is useful to explore the extent to which this concept appears in other literary texts.

If a search is conducted for all possible variants of the words collocate and collocation in all the texts currently making up the Literature Online resource, 79 instances are found in all, and of these 13 refer to non-linguistic senses of the words, three are instances of Latin texts and a further three are duplicate entries. The remaining 60 occurrences refer to a roughly equal mix of what may be called ‘unusual’ and ‘habitual’ associations between linguistic units. The earliest text in which the linguistic sense is found is The Art of Rhetorick Concisely and Compleatly Handled, by John Barton, from 1634. In this text an example is given to illustrate the notion of composition:

Composition is a smooth linking together of select words and clauses. Psal.3. 24. In stead of sweet smell, there shall be a stink; in stead of a girdle, a rent; in stead of well-set hair, baldnesse; in stead of a stomacher, a girdle of sackcloth; and burning, in stead of beauty. (Barton 1634, 25)

Later Barton comments on this example:

Sometimes we allude to the pace or measure of words, as in the last example; The clauses are all of alike size, which makes them runne very pleasantly. Sometimes we have allusions both of the sound, sense, and pace together. There are 3 vertues in this Figure; the one intimated in the word smooth, that is, such a collocation and well-ordered disposition of the word, as doth avoid harshnesse, and pleaseth the eare with an harmonious consonancie of syllables, as in the example is plain: For if the last clause had kept the form of the precedent thus, And in stead of beautie, burning, it would have sounded more unpleasantly, but that transposition of the words gives a grace unto them. (Barton 1634, 29)

Here collocation is a property of each word selected in the process of skilful composition, used to emphasise and explain the perfection of this translation of the Psalm. This area of textual criticism is concerned

Copyrighted material – 978–1–403–94613–3

Copyrighted material – 978–1–403–94613–3

The concept of collocation 9

with the quality of the language used to achieve specific effects, one of the key areas of rhetoric, where past gems are used as exemplars for future production. The emphasis on combinations of words to produce these effects prefigures the guidance on collocation which forms such a feature of dictionaries for learners of English in the twentieth century, dealt with in detail in section 1.8 below.

1.4 The use of collocation in dictionaries before the twentieth century

The frequency of occurrence of the word collocation used in its linguis-tic sense in the texts examined in the previous section shows that this sense was already well established by the beginning of the eighteenth century. We can gain a fuller appreciation of the importance given to the concept of collocation during the period by investigating the use made of it by lexicographers in the compilation or presentation of their dictionary entries.

1.4.1 Johnson

Johnson’s Dictionary, first published in 1755, contains an enormous number of illustrative quotations. The exact use that Johnson makes of his sources as a lexicographer is debatable, but the fact that he occasion-ally strays beyond the strict confines of literature is specifically justified in his preface to the first edition:

Some of the examples have been taken from writers who were never mentioned as masters of elegance or models of stile; but words must be sought where they are used; and in what pages, eminent for purity, can terms of manufacture or agriculture be found? ( Johnson 1755, Preface, 6. Note: In the facsimile edition of the 1755 Dictionary there are no page numbers; page references to the Preface are given by counting its first page as 1)

The importance that Johnson attaches to these illustrations is made explicit in the Preface. ‘That part of my work on which I expect malig-nity most frequently to fasten, is the Explanation’, he predicts (Johnson 1755, Preface, 5), but he also supplies a remedy in the illustrations:

The solution of all difficulties, and the supply of all defects, must be sought in the examples, subjoined to the various senses of each

Copyrighted material – 978–1–403–94613–3

Copyrighted material – 978–1–403–94613–3

10 Collocation

word, and ranged according to the time of their authours. ( Johnson 1755, Preface, 6)

However, these illustrations can only provide this remedy if they give sufficient information about the words under discussion:

It is not sufficient that a word is found, unless it be so combined as that its meaning is apparently determined by the tract and tenour of the sentence; (Johnson 1755, Preface, 7)

Here we have a clear description of the practical significance of col-location for the user of the dictionary, regardless of the precise ways in which Johnson may have selected and used his illustrations.

1.4.2 The Oxford English Dictionary

The OED itself was composed from citation slips excerpted by volunteer readers from a list of prescribed texts. The original appeal for readers was made by the Philological Society in 1859, but lack of financial support and a publisher led to the project falling into abeyance. When a contract was signed with the Clarendon Press in 1879 to publish the dictionary, James Murray issued a new appeal. The second edition of this appeal, issued in June 1879, specifies, among other things, the basis on which quotations containing words on the list should be selected by readers from the specified texts. Page 5 contains a 12-point set of ‘Directions to Readers for the Dictionary’ and point 5 recommends:

Make a quotation for every word that strikes you as rare, obsolete, old-fashioned, new, peculiar, or used in a peculiar way. (Murray 1879, 5)

Point 7 extends these categories to include less unusual words:

Make as many quotations as convenient to you for ordinary words, when these are used significantly, and help by the context to explain their own meaning, or show their use.

The distinction between the treatments recommended for rare and ordinary words is considered important enough to be explained in more detail on the following page of the appeal. In both cases, the purpose of the quotation is the same, as specified in point 9:

… the quotation must be sufficient to show the meaning, or use, and to make connected sense.

Copyrighted material – 978–1–403–94613–3

Copyrighted material – 978–1–403–94613–3

The concept of collocation 11

These principles for the selection of quotations reflect the use to which the lexicographers will be putting them, which is, of course, to establish the textual environments in which the words are to be found, using their ‘habitual juxtapositions’ to establish meaning and use. In other words, the OED’s entries for the words, their sense structures and usage patterns, are based on an analysis of their collocation with other words.

Here the lexicographic principle is much clearer than in the case of Johnson: the collocation data obtained from the selected citations was used directly as the basis for identifying and disambiguating the senses of headwords, and a sample of the illustrative quotations is provided in the published OED for the reader as evidence for, and clarification of, the decisions made. Firth (1935, 7), in a general consideration of a con-textual theory of meaning, draws attention explicitly to this principle underlying the lexicography of the OED, and this is discussed in detail in section 2.3.1 below.

There is also extensive direct and explicit evidence of the use of the concept of collocation in the OED: within its definition text the word collocation is used extensively. The web-based version of the OED has an advanced search routine which allows text patterns to be searched for in different components of the entries. If the pattern ‘collocat*’ is searched for within the definition text of the second edition, originally published in 1989, 570 occurrences are retrieved.

Some of these occurrences reflect the notion of ‘habitual’ colloca-tions, as in sense 1a of divot used as a noun:

1. a. A slice of earth with the grass growing upon it, a turf, a sod, such as are used in the north for roofing cottages, forming the edges of thatched roofs, the tops of dry-stone walls, etc.

The thicker, more earthy sods used in building walls or dikes, are called fails; hence the common collocation fail and divot. The digging and throwing up of either is ‘casting’: see CAST v. 28.

Others seem to reflect the notion of more unusual combinations, as in sense 4 of hungry:

4. In special collocations.

†hungry evil (sickness), a disease in horses characterized by insatiable hunger. †hungry gut, (a) the intestinum jejunum, the part of the small intestine between the duodenum and the ileum, so called because it is supposed to be usually found empty after death; also fig.; (b)

Copyrighted material – 978–1–403–94613–3

Copyrighted material – 978–1–403–94613–3

12 Collocation

in quot. 1552, a person with hungry guts, a glutton. hungry rice, a grain allied to millet, Paspalum exile, much cultivated in West Africa. †hungry worm (see quot. 1737).

If these occurrences are checked against the first edition, it is clear that many of them (328 out of the total of 570) date from its predominantly nineteenth-century text, including the two examples cited above. This shows a thorough appreciation by the end of the nineteenth century of the later Firthian usage of collocation as a technical linguistic term, although this is cited in the second edition of the OED as first appearing in Trager in 1940, and in Firth in 1951:

1c. Linguistics. The habitual juxtaposition or association, in the sen-tences of a language, of a particular word with other particular words; a group of words so associated.

Introduced by J. R. Firth as a technical term in modern Linguistics, but not fully separable from examples in sense 1a nor from other uses as exemplified in quot. 1940.

1940 G. L. TRAGER in Language XVI. 301 Collocation establishes cat-egories by stating the elements with which the element being studied enters into possible combinations. Ibid. 303 It is now necessary to establish the collocations of the various forms to see what their func-tions are. 1951 J. R. FIRTH in Ess. & Stud. IV. 123, I propose to bring forward as a technical term, meaning by ‘collocation’, and to apply the test of ‘collocability’.

While first occurrences in the OED are almost inevitably going to appear later than first usage in the language, if only because of the dictionary’s reliance on published appearances of words, this provides evidence that the lexicographers themselves were using the word in its modern linguistic sense for a considerable time before they gave the sense rec-ognition in the dictionary.

More detailed consideration of the OED’s use of collocation is given in Part III.

1.5 Practical uses of collocation – concordances

The way in which collocation was used by the OED’s lexicographers (shown in the appeal described in section 1.4.2 above) positions the word firmly in the area of semantics, whereas the examples found in Barton

Copyrighted material – 978–1–403–94613–3

Copyrighted material – 978–1–403–94613–3

The concept of collocation 13

and Southey (shown in section 1.3 above) place it equally firmly in the area of rhetorical style as a part of the theory of textual criticism.

The area of textual theory dealing with explication or hermeneutics brings together these two aspects of the concept of collocation. The meanings of individual words are seen as a crucial element of the mean-ings of larger textual units, and the contexts in which words are found are an essential element in the determination of their signification. Section 1.5.1 explores the development of the concordance as a tool of this form of textual criticism.

1.5.1 Cruden

In the preface to the first edition of his Complete Concordance to the Old and New Testament, Cruden explains the nature and purpose of his book:

A concordance is a dictionary, or an index to the Bible, wherein all the words, used through the Inspired Writings are ranged alpha-betically, and the various places where they occur, are referred to, to assist us in finding out passages, and comparing the several significa-tions of the same word. (Cruden 1769, vii)

Similar works, as Cruden explains in the same preface, were already in existence, though they lacked the scale and comprehensive nature of Cruden’s concordance. Many editions were produced in Cruden’s own lifetime, and the work is said never to have been out of print since. Several editions are currently available.

As an illustration of Cruden’s method, consider the first part of the entry for the word very, found on p. 536 of the edition cited in the Bibliography:

VERY

Gen. 27. 21. whether thou be my v. sonExod. 9. 16. in v. deed for this I raised thee upNum. 12. 3. now the man Moses was v. meekDeu. 30. 14. but the word is v. nigh unto thee1 Sam. 25. 34. in v. deed except thou hadst hasted26. 4. understood that Saul was come in v. deed2 Sam. 24. 10. for I have done v. foolishly2 Chron. 20. 35. king Ahaziah did v. wickedlyNeh. 1. 7. we have dealt v. corruptly against theePsal. 5. 9. their inward part is v. wickedness

Copyrighted material – 978–1–403–94613–3

Copyrighted material – 978–1–403–94613–3

14 Collocation

This shows the main features of the format used to squeeze the concord-ance information into the small space available in the text. As Cruden says:

It is printed with a good letter, though pretty small, which was nec-essary in order to bring it into this volume, and make it contain multum in parvo, much in a little compass; (Cruden 1769, vii)

As can be seen, for each occurrence of the word a reference is given to its position in the text of the King James Bible, together with sufficient context to enable disambiguation of senses and a general idea of the rel-evance of the cited word. This context, as we shall see when we examine the modern, computational analytical methods available for colloca-tion, forms the basic data necessary to investigate it. While Cruden does not carry out any of the analysis directly, he provides a qualitative basis for, at least, a general awareness of the importance of a word’s environ-ment. The sense closest to the modern technical linguistic use of the word collocation is that given by sense 1c in the OED:

The habitual juxtaposition or association, in the sentences of a lan-guage, of a particular word with other particular words; a group of words so associated.

Cruden shows a thorough appreciation of this ‘habitual juxtaposition or association’, both in his general provision of each word’s environment in his concordance, and in the specific guidance given where particular pat-terns can be identified. As an example, under the word dry he comments:

By the words annexed to DRY, the meaning is obvious. It is spoken of land, ground, provision, waters, trees, and other things. (Cruden 1769, 121)

He goes on to provide a separate concordance listing for ‘DRY ground’, as the most significant grouping of words associated with the word dry:

Gen. 8, 13. behold the face of the ground was d. Exod. 14, 6. Isr. shall go on d. ground in the sea Josh. 3, 17. the priests that bore the ark stood firm on d. ground in Jordan. Israel passed on d. ground 2 Kings 2, 8. Elijah and Elisha went over on d. g. Psal. 107, 33. he turneth water-springs into d. ground

Copyrighted material – 978–1–403–94613–3

Copyrighted material – 978–1–403–94613–3

The concept of collocation 15

35. he turneth d. ground into water-springs Isa. 44, 3. I will pour floods upon the d. ground 53, 2. He shall grow as a root out of a d. ground Ezek. 19, 13. She is planted in a d. and thirsty ground

(Cruden 1769, 121–2)

For the more important words Cruden goes further. In the part of the preface to the first edition which deals with the structure of the work, he states that in the section that deals with common words he has ‘given the various SIGNIFICATIONS of the principal words’. In the entry for ignorance for example, on p. 248:

IGNORANCE

Signifies, [1] Want of the true knowledge of God and of heavenly things, Eph. 4. 18. [2] Unbelief, which follows ignorance, 1 Pet. 1. 14. [3] Error, imprudence, or surprise, Lev. 4. 2, 13. [4] Idolatry, Acts 17. 30.

For these words, Cruden has identified a range of senses, provided brief definitions or synonyms and given references to the illustrative cita-tions for each sense listed in the concordance entries below. Cruden’s senses may themselves be open to question, but his method shows a lively awareness of the dependence of meaning on environment, and his allocation of selected concordance entries to identified significations shows a well-developed use of the principles of collocation for disam-biguation. It must be assumed that the value placed on the concordance by its many users over the centuries reinforces this appreciation of the value of collocation in linguistic analysis.

1.5.2 Other concordances and their applications – the nineteenth century onwards

The Bible was seen by Cruden, his predecessors and many who succeeded him, as the work pre-eminently suitable for the production of such a use-ful index. This view is made explicit in the preface to the first edition:

…if a good Index to any other book is to be valued, much more ought one to the BIBLE, which is a revelation from GOD. (p. viii)

In the early days of textual analysis of the nineteenth and early twentieth centuries, only the most significant of texts were seen as being equally worthy of the effort involved in the production of a concordance. A search

Copyrighted material – 978–1–403–94613–3

Copyrighted material – 978–1–403–94613–3

16 Collocation

of a library catalogue for books published before 1900 containing the word concordance in their titles shows that Shakespeare (1787), Milton (1857), Tennyson (1869), Homer (1880) and Dante (1888) were all accorded this honour during the eighteenth and nineteenth centuries.

As linguistics developed during the twentieth century, the scope of exploration widened, first to include less obvious literary texts, and then to include non-literary texts as a source of ordinary language. The major practical application of this widening so far has been in lexi-cography: since the COBUILD project produced the first corpus-based dictionaries in the late 1980s the use of corpora as a source of lexico-graphical information has become almost the norm, and the concept of collocation already described in section 1.4.2 above, the ‘habitual juxtaposition’ of words, is routinely used as a basis for the identifica-tion and disambiguation of senses. Modern applications of collocation in lexicography are discussed in Part III.

1.6 Guides to the proper use of language

The first monolingual English dictionaries produced during the early sev-enteenth century, beginning with Cawdrey’s A Table Alphabeticall in 1604, provided relatively limited lists of hard words together with brief explana-tions of their meanings in plainer English, often by means of synonyms. At the beginning of the eighteenth century, J.K.’s A New English Dictionary, published for the first time in 1702, began the process by which the mono-lingual English dictionary was transformed into the comprehensive and authoritative account of English vocabulary that we are used to today. This process was given a more specifically prescriptive and authoritarian twist by Johnson in his Dictionary, published for the first time in 1755.

Alongside this new authoritarianism in lexicography, in which the emphasis shifted from to the extension of users’ vocabulary to the provision of advice on all the words in the lexis and their proper usage, books of grammar also developed. These books, which laid down detailed rules governing the syntax of English, often owed more to the perceived grammar of Latin than to any observable usage patterns in contemporary English. Indeed, a major part of the programme of these grammarians was the correction of common English usages which were seen as illogical or inelegant, such as multiple negation, split infinitives or improper use of past tenses as past participles.

The success of the grammar books, and their enormous impact on the English of the nineteenth century, is based largely on the linguistic inse-curity of their users. This insecurity went far beyond doubts about syntax

Copyrighted material – 978–1–403–94613–3

Copyrighted material – 978–1–403–94613–3

The concept of collocation 17

and lexis, to the extent that in situations demanding a more formal or serious approach to language, users often felt the need for detailed guidance on the basic phraseology of their writing. This gave rise in the nineteenth century to a new type of language aid, the writing manual.

In many cases these manuals went beyond simply giving advice on the use of language and covered all aspects of social conduct. The insecurity felt by people in a world where social mobility was starting to have practical effects was presumably both powerful enough and sufficiently comprehensive for many of them to need general advice on all aspects of their place and proper behaviour in society. As an example of this form of comprehensive manual, here is the text of the title page of a mid nineteenth-century publication:

The Lady’s Guide to Perfect Gentility, in manners, dress, and conver-sation, in the family, in company, at the piano forte, the table, in the street, and in gentlemen’s society.Also a useful instructor in letter writing, toilet preparations, fancy needlework, millinery, dressmaking, care of wardrobe, the hair, teeth, hands, lips, complexion, etc.By Emily Thornwell.Author of “Home Cares Made Easy,” etc.New York: Derby & Jackson, 119 Nassau St. Cincinnati: H. W. Derby & co. 1857.(Thornwell 1857, title page)

Within The Lady’s Guide, Chapters IV and V deal with specifically lin-guistic matters. Chapter IV covers ‘The art of conversing with fluency and propriety’ and Chapter V deals with ‘The whole art of correct and elegant letter writing’.

Generally, Chapter IV deals more with the purely social side of conver-sation than with its strictly linguistic aspects, but there is a hint towards the end of the chapter of advice directly related to collocation:

Do not use the terms “genteel people;” “this, that, or the other, is very genteel.” Substitute for them, “They are highly accomplished;” “He is a gentlemanly man;” “He has a gentlemanly appearance;” “She has the manner of a gentlewoman.” (Thornwell 1857,152)

Chapter V contains models and plans of letters designed to deal with a variety of potentially challenging social situations. As an example, here

Copyrighted material – 978–1–403–94613–3

Copyrighted material – 978–1–403–94613–3

18 Collocation

is the suggested form for ‘a lady in answer to a letter in which her suitor intimates his wish to discontinue acquaintance’:

SIR:

I acknowledge the receipt of your last letter, which now lies before me, and in which you convey the intimation, that the position in which, for some time past, we have regarded each other, must hence-forth be abandoned.

Until the receipt of this letter, I had regarded you in the light of my future husband; you were, therefore, as you have reason to know, so completely the possessor of my affections, that I looked with indif-ference upon every other suitor. The remembrance of you never failed to give a fresh zest to the pleasures of life, and you were in my thoughts at the very moment in which I received your letter.

But deem me not so devoid of proper pride as to wish you to revoke your determination, from which I will not attempt to dissuade you, whether you may have made it in cool deliberation, or in precipitate haste. Sir, I shall endeavor to banish you from my affections, as read-ily and completely as you have banished me; and all that I shall now require from you is this, that you will return to me whatever letters you may have of mine, and which I may have written under a foolish confidence in your attachment, and when you were accredited as the future husband of,

Sir,Yours as may be,HENRIETTA ALLSTON.(Thornwell 1857, 167–8)

The lexis used in this letter is formal, and grouped in places into for-mulaic phrases, such as ‘the possessor of my affections’, ‘fresh zest’, ‘pleasures of life’, ‘cool deliberation’, ‘precipitate haste’ and ‘foolish confidence’. Since the example is given as a model of the type of letter appropriate to this difficult situation, we can perhaps assume that these phrases are intended to be useful linguistic building blocks, intended to be used directly in the reader’s own letter. As such, they can be seen to represent examples of perceived collocations, now being specified as ready-made language components for users of The Lady’s Guide.

An enormous number of books of etiquette were produced during the nineteenth and early twentieth centuries, often containing a significant

Copyrighted material – 978–1–403–94613–3

Copyrighted material – 978–1–403–94613–3

The concept of collocation 19

amount of guidance on the linguistic or paralinguistic conventions involved in exchanges between the sexes and between people from different levels of society.

More general guides on English usage were also produced during the nineteenth and twentieth centuries. As an extension of the comprehen-sive dictionaries and the prescriptive grammars of the eighteenth and nineteenth centuries, these books provided guidance on the ways in which the lexis described in the dictionaries should be combined beyond the syntactic frameworks specified in the grammars. As an example, The King’s English, first published in 1906, deals with the basic areas of vocabulary, syntax and punctuation, but also has a chapter called ‘Airs and Graces’ which includes, in its miscellaneous section, a sub-section covering ‘Some more trite phrases’ (Fowler and Fowler, third edition, 1931, 222–4):

The worn-out phrases considered in a former section were of a humor-ous tendency: we may add here some expressions of another kind, all of them calculated in one way or another to save the writer trouble; the trouble of description, or of producing statistics, or of thinking what he means.

Under this heading they include such phrases as more easily imagined than described, depend upon it, in a vast majority of cases and it stands to reason. Objections are made to these phrases on the basis that they ‘are all apt to damage the cause they advocate’, either because they are being used incorrectly, or because of their frequent use in inappropriate situations:

The shrill formula ‘It stands to reason’ is one of the worst offenders. Originally harmless, and still no doubt often used in quite rational contexts, the phrase has somehow got a bad name for prefacing fal-lacies and for begging questions;

Here, then, is collocation identified at one end of the scale as cliché, associations between words that have become so definite and habitual that they are now tired formulae. Although the concept of collocation as a neutral linguistic fact is clearly established by this time for the pur-poses of the OED, its visibility to the average user of language appears to be exclusively negative.

1.7 Collocation as cliché

An example taken from the twentieth century of collocation coming under attack as a source of linguistic malpractice illustrates the

Copyrighted material – 978–1–403–94613–3

Copyrighted material – 978–1–403–94613–3

20 Collocation

overwhelmingly negative attitude attached to clichés. Brian O’Nolan, under the pseudonym Myles na gCopaleen, in his column for the Irish Times, produced:

The Myles na gCopaleen Catechism of Cliche. In 356 tri-weekly parts. A unique compendium of all that is nauseating in contemporary writ-ing. Compiled without regard to expense or the feelings of the public. A harrowing survey of sub-literature and all that is pseudo, mal-dicted and calloused in the underworld of print. Given free with the Irish Times. (na gCopaleen 1968, 202)

Here is a short extract, showing the basic method:

What, as to the quality of solidity, imperviousness, and firmness, are facts?Hard.And as to temperature?Cold.With what do facts share this quality of frigidity?Print.To what do hard facts belong?The situation.And to what does a cold fact belong?The matter.What must we do to the hard facts of the situation?Face up to the hard facts of the situation.What does a cold fact frequently still do?Remain.And what is notoriously useless as a means of altering the hard facts of the situation?All the talk in the world.(na gCopaleen 1968, 208)

The partly dismantled clichés used in this exercise are easy enough to reconstruct, and the hard facts of the situation has been given as an example in case the reader has any difficulty. With a very little effort others will become visible – the cold facts of the matter, the cold fact still remains and so on.

Not all linguists see clichés as totally negative phenomena. Partridge, in the introduction to the fourth edition of his A Dictionary of Clichés,

Copyrighted material – 978–1–403–94613–3

Copyrighted material – 978–1–403–94613–3

The concept of collocation 21

questions the assumed consensus on their nature and attempts to classify them into four groups:

1. Idioms that have become clichés2. Other hackneyed phrases3. Stock phrases and familiar quotations from foreign languages4. Quotations from English literature

(Partridge 1950, 4)

The first group, the ‘idiom-clichés’, are idioms that have been so over-used that ‘the original point has been blunted or even removed entirely’. Partridge gives several examples, including doublets such as fast and loose, tooth and nail and chop and change, and ‘battered similes’ such as as old as the hills. The second group, the ‘non-idiomatic clichés: phrases so hackneyed as to be knock-kneed and spavined’, includes items such as add insult to injury, generous to a fault and beyond the pale. The third group includes cui bono?, de mortuis (with a pregnant pause) and sotto voce. The fourth, being English, allows more scope for misquotation, with the misquoted version often forming the cliché, as in fresh fields and pastures new (almost from Milton’s Lycidas).

Redfern’s 1989 study, Clichés and Coinages, examines the relationship between the recycling of old usages and the construction of new ones. In the conclusion he summarises the unstable equilibrium maintained between them:

Clichés will not go away, nor should we even desire them to. Use them. Know them. Use them knowingly. Neologisms are a test of our relation-ship with and concern for others (one test among hundreds). We have, in making new, to make ourselves understood. (Redfern 1989, 256)

The comprehensive range of phrases suggested by Partridge’s exam-ples implies that few, if any, of us could make clichés go away even if we wanted to, and between the total originality of language that seems to be demanded to avoid the cliché and the tired formulae despised by the Fowlers and O’Nolan, there is ample scope for an appreciation and exploitation of those habitual associations between words which are useful and empowering to the language user. An awareness of this on the part of pioneering language teachers of the early twentieth century led to the incorporation of information relating to collocation between words in a specific type of dictionary being developed in the early part of the twentieth century.

Copyrighted material – 978–1–403–94613–3

Copyrighted material – 978–1–403–94613–3

22 Collocation

1.8 Learners’ dictionaries in the early twentieth century

It may be useful to consider the predecessors of the English learners’ dictionary. Dictionaries of English have always had, or made claims to, some sort of pedagogic role. The earliest work generally accepted as a monolingual dictionary of English is Cawdrey’s A Table Alphabeticall, published in 1604. Its full title is:

A Table Alphabeticall, conteyning and teaching the true writing, and vnderstanding of hard vsuall English wordes, borrowed from the Hebrew, Greeke, Latine, or French. &c.

With the interpretation thereof by plaine English words, gathered for the benefit & helpe of Ladies, Gentlewomen, or any other vnskilfull persons.

Whereby they may the more easilie and better vnderstand many hard English wordes, which they shall heare or read in Scriptures, Sermons, or elswhere, and also be made able to vse the same aptly themselues.(Cawdrey 1604, title page)

Similar references to teaching feature in the title pages of the other hard word dictionaries of the seventeenth century. These works, deal-ing essentially with the enhancement of the users’ lexical resources, can almost be considered to be bilingual dictionaries, and these are normally associated with an explicit teaching role.

Although very different in scope and purpose, the later comprehen-sive dictionaries were also inescapably didactic. Johnson in his preface to the Dictionary, declares:

I shall not think my employment useless or ignoble, if by my assist-ance foreign nations, and distant ages, gain access to the propagators of knowledge, and understand the teachers of truth; if my labours afford light to the repositories of science, and add celebrity to Bacon, to Hooker, to Milton, and to Boyle. ( Johnson 1755, Preface, 10)

His project, laid out in The Plan of a Dictionary of the English Language, is clearly pedagogical:

a dictionary by which the pronunciation of our language may be fixed, and its attainment facilitated; by which its purity may be preserved, its use ascertained, and its duration lengthened. (Johnson 1747, 32)

Copyrighted material – 978–1–403–94613–3

Copyrighted material – 978–1–403–94613–3

The concept of collocation 23

The teaching element here is of a very advanced level. Intended generally for educated native speakers of English, the mainstream comprehensive dictionaries of the nineteenth and twentieth centuries also set out to provide detailed guidance on relatively complex areas of lexical doubt for those who are already competent in their general use of the language.

The comprehensive monolingual English dictionary followed the Johnsonian model throughout the nineteenth and twentieth centu-ries and found a constant market among native English speakers. In the early twentieth century, a new type of dictionary began to appear: a monolingual English dictionary for non-native speakers who wished to learn the language. In many ways this was a response to new approaches to the teaching of language, developed in the late nineteenth century by, among others, Henry Sweet, Paul Passy, Otto Jespersen, Wilhelm Vietor and Maximilian Berlitz. The treatment of English in this way may also have reflected its growing importance as a European, and later global, language.

1.9 Palmer and the Report on English Collocations

Harold Palmer, born in 1877, applied these new approaches in his own language teaching methods. In 1922 he was appointed Linguistic Advisor to the Japanese Ministry of Education, and in 1923 Director of the Institute for Research in English Teaching. In 1927 the IRET was commissioned to produce a ‘limited English word-list’, which might ultimately be recommended ‘as corresponding to the vocabu-lary required of an entrant to the schools of higher grade’ (Palmer 1933, 1).

The vocabulary control movement which developed from this commission involved two other pioneers of the monolingual learners’ (or English as a Foreign Language – EFL) dictionary, Michael West and A.S. Hornby, and informed the production of the major EFL teaching aids and dictionaries of the 1930s. In 1933 Palmer published the Second Interim Report on English Collocations, described by Cowie as ‘destined to have a profound and enduring influence on EFL dictionary- making’ (Cowie 1999, 52). This ‘Second’ report, submitted to the Tenth Annual Conference of English Teachers in Tokyo in 1933, represents a ‘thor-oughly revised and considerably augmented edition’ of the First Interim Report, presented as a mimeographed copy to the Eighth Annual Conference (Palmer 1933, 1).

Copyrighted material – 978–1–403–94613–3

Copyrighted material – 978–1–403–94613–3

24 Collocation

The word collocation had no fixed status as a linguistic term at the time of publication of this report, and Palmer refers to the linguistic phenomena described in the report as:

…those things that have been alluded to at different times variously as comings-together-of words, word-compounds, successions of words, phrases, locutions, idioms, word-collocations, non-normal collocations, irregular collocations, or simply as collocations. (Palmer 1933, 1)

This covers a significantly wider selection of items than would be included in the current sense of the word collocation. The layout of the report also involves classification by syntactic pattern, much closer to pattern grammar models than to current lists of collocates as produced by collocation analysis software. Despite these differences, the publica-tion of the report definitely fixes the use of collocation as a linguistic term, ten years in advance of the earliest citations in the OED for sense 1c, already referred to in section 1.4.2 above:

The habitual juxtaposition or association, in the sentences of a lan-guage, of a particular word with other particular words; a group of words so associated.

It also stresses the importance attached by Palmer to combinations of words for learners of English. The pedagogic implications of collocation for learners of English had already been noted. Sweet, in the work which laid out the principles of the new approach to language teaching which he proposed, described the problem by comparing sentence construc-tion to irregularity in morphology:

But just as we cannot go on speaking long without using irregular inflections, so also we cannot go on speaking naturally for any length of time without using irregular combinations of words – combinations which cannot be constructed à priori. (Sweet 1899, 71)

Cowie, in his survey of English learners’ dictionaries, quotes the Fowlers’ warning on the behaviour of common words:

entangled with other words in so many alliances and antipathies during their perpetual knocking about the world that the idiomatic use of them is far from easy. (Fowler and Fowler 1911, v, quoted in Cowie 1999, 52)

Copyrighted material – 978–1–403–94613–3

Copyrighted material – 978–1–403–94613–3

The concept of collocation 25

Palmer deals with the problem by providing systematic guidance on collocations, the result of a thorough exploration:

It is not enough to suggest in a haphazard way the inclusion or exclusion of any word, word-compound, phrase, proverbial expres-sion, etc. that may occur to us. The work must start with collecting and classifying, and this must be done on a large scale and according to an organized plan – and we have been doing on a large scale and according to an organized plan this work of collecting and classifying those things that must be collected and classified. (Palmer 1933, 1)

He provides a ‘random but representative list’ to illustrate the kinds of items involved and their ‘extreme heterogenousness’ (pp. 2–4), and then considers their definition, which he gives in pedagogic terms:

…each one of them must or should be learnt, or is best or most con-veniently learnt as an integral whole or independent entity, rather than by the process of piecing together their component parts. (Palmer 1933, 4)

After some discussion of possible terms for these items, he selects collocations (using the entries already described in the OED as part of his justification), and limits the scope of this term to ‘…successions of words which (for various reasons) are best learnt as integral wholes’ (p. 8). He goes on to position the collocations dealt with in the Report within an overall classification system of ‘general linguistic symbols’ within which they occupy group 3:

Collocations that are classifiable under such headings as Verb- collocations, Noun-collocations, Adverb-collocations, Preposition- collocations etc. (Palmer 1933, 18)

As an example of the treatment given to collocations in the Report, here is part of the list provided for category number 31211 – ‘all combinations of verbs with specific nouns’ (p. 50) – for the verb strike:

To strike a blow (× for × N3)To strike a lightTo strike one’s fancyTo strike the hourTo strike twelve [one, two, etc.]

(p. 58)

Copyrighted material – 978–1–403–94613–3

Copyrighted material – 978–1–403–94613–3

26 Collocation

Although there is some general discussion at the start of each category of the nature of the collocations dealt with, no attempt is made to explain their usage. The emphasis placed on the word-class of the col-locations fixes them firmly within the syntactic structure of language as units interchangeable with other similar items, ‘construction-patterns’ in Palmer’s terms (p. 19), an organisation chosen to enable ‘the student to deduce laws of analogy, thereby facilitating his task which would otherwise be one of sheer memorizing’ (p. 20).

Despite this reference to language learning, the Report was never intended as a teaching aid. Palmer describes it as ‘composed by techni-cians for technicians’ (p.11). This has not prevented it from having a major impact on the production of teaching materials for learners of English, beginning with the works produced by Palmer such as A Grammar of English Words (Palmer 1938).

In A Grammar of English Words Palmer uses a similar approach to that used in the Second Interim Report to provide:

A manual of the usage of those English words that have been found by experience to constitute the bulk of learning-effort on the part of the student of English as a foreign language. (Palmer 1938, iii)

Entries are organised by ‘caption words’, the heads of ‘word-groups’ within which detailed information is provided for ‘working units’ covering – where appropriate – grammatical function, inflected forms, regular derivatives, definitions, semantic varieties, collocations, phrases and construction patterns. Phrases and collocations are distinguished in the Introduction to the Grammar:

While collocations are comparable in meaning and function to ordi-nary single ‘words’ (and indeed are often translated by single words in the student’s mother-tongue), phrases are more in the nature of conversational formulas, sayings, proverbs etc. (Palmer 1938, xi)

The entry for strike in the Grammar shows the difference between its approach and that of the Report. Here is an extract from the first part of the entry:

1. = hit, give a blow to sg. or sy., come in violent contact with sg. or sy. With direct object (or used transitively). See V.P. 4He struck the ball.Why did you strike her?The ship struck a rock.(Palmer 1938, 204)

Copyrighted material – 978–1–403–94613–3

Copyrighted material – 978–1–403–94613–3

The concept of collocation 27

The entry goes on to list five further senses of strike, exploring the intransitive use of sense 1 and senses roughly related to the colloca-tions listed in the Report. For each the sense is explained, related to syntactic patterns (such as verb pattern 4) and exemplified. This puts useful pedagogic flesh on the technical bones of the Report, and allows the information on collocations collected for it to become properly use-ful to learners of the language. This information was soon incorporated into more conventional aids to learning.

1.10 Hornby and the Idiomatic and Syntactic English Dictionary

The Idiomatic and Syntactic English Dictionary (Hornby, Gatenby and Wakefield 1942), which developed into the Oxford Advanced Learners’ Dictionary of Current English, is a typical example of the use of colloca-tion guidance within a learners’ dictionary.

Cowie (1999, 59–62) shows that most of the material dealt with in the Second Interim Report is incorporated into the Dictionary, but that many of the collocations dealt with in this dictionary come from other sources and are not found in the Report. This emphasises the effect of the Report on the compilation of this dictionary: the principle of including infor-mation on collocations had gone beyond the items actually identified in the Report, and any information for which good evidence could be found was now seen as valid for inclusion in general learners’ dictionaries.

1.11 Collocations dictionaries

The treatment of collocations as the subject of dictionaries in their own right developed as a natural sequence to their inclusion as additional information in learners’ dictionaries. Mackin and Cowie’s work on the identification of collocations, begun in the late 1950s, is described in Mackin (1978):

The compiler of a dictionary of collocations has three main sources open to him: first, other dictionaries; second, his own ‘competence’; and third, occurrences met with in the course of reading and listen-ing to the spoken word on radio, on television, in conversation, at lectures, at the cinema, and so on. (Mackin 1978, 152)

The dictionary produced from this work, the Oxford Dictionary of Current Idiomatic English (ODCIE), was originally published in two volumes. In 1993 Volume 1, published in 1975 (Cowie and Mackin 1975), became

Copyrighted material – 978–1–403–94613–3

Copyrighted material – 978–1–403–94613–3

28 Collocation

the Oxford Dictionary of Phrasal Verbs, and Volume 2 (Cowie, Mackin and McCaig) published in 1983, became the Oxford Dictionary of English Idioms. This division into two distinct areas of combination suggests that the phrasal verbs represent a different phenomenon from the other forms of collocation.

The distinction made in the later editions of the dictionary was vis-ible more than two hundred years earlier in Johnson’s consideration of phrasal verbs in the Preface to the Dictionary. He describes them as items which are inherently problematic for the lexicographer:

My labour has likewise been much increased by a class of verbs too frequent in the English language, of which the signification is so loose and general, the use so vague and indeterminate, and the senses detorted so widely from the first idea, that it is hard to trace them through the maze of variation, to catch them on the brink of utter inanity, to circumscribe them by any limitations, or interpret them by any words of distinct and settled meaning: such are bear, break, come, cast, full, get, give, do, put, set, go, run, make, take, turn, throw. If of these the whole power is not accurately delivered, it must be remembered, that while our language is yet living, and variable by the caprice of every one that speaks it, these words are hourly shift-ing their relations, and can no more be ascertained in a dictionary, than a grove, in the agitation of a storm, can be accurately delineated from its picture in the water. (Johnson 1755, Preface, 5)

The separation into two distinct sets of collocation phenomena by the ODCIE reflects developments in the concept of collocation and its pedagogic implications. This in turn influenced the construction of other specialised dictionaries dealing exclusively with collocation in the last decades of the twentieth century. Let us consider the two most notable examples, the Combinatory Dictionary of English (Benson, Benson and Ilson, 1986, revised edition 1997 – BBI) and the Oxford Collocations Dictionary (Crowther, Dignen and Lea, 2002 – OCD).

Both are aimed at non-native learners of English, and therefore belong in the tradition of learners’ dictionaries, and both deal (BBI primarily, OCD exclusively) with nouns, adjectives and verbs. Both also cover the full range of the collocations that they have identified as most relevant for learners, but claim to cover only those idioms that are on the borders of collocation:

This Dictionary does not normally include idioms, i.e. frozen expressions in which the meaning of the whole does not reflect

Copyrighted material – 978–1–403–94613–3

Copyrighted material – 978–1–403–94613–3

The concept of collocation 29

the meanings of the component parts: to kill two birds with one stone ‘to achieve two aims with one action’; to be beside oneself ‘to be in a state of great emotional confusion’. Some phrases, espe-cially those expressing a simile, are transitional between colloca-tions and idioms, that is, the meanings of the component parts are reflected partially in the meaning of the whole. The Dictionary does include important phrases of this type. For example, under misc., the entry for bird has as free as a bird, the entry for feather has as light as a feather, the entry for sugar has as sweet as sugar, etc. (BBI, xxiv)

Totally free combinations are excluded and so, for the most part, are idioms. Exceptions to this rule are idioms that are only partly idiomatic: not see the wood for the trees may have nothing to do with wood or trees, but drive a hard bargain is very much about bargaining even if the expression as a whole can be considered an idiom. (OCD 2002, viii)

Both dictionaries clearly derive from the approach used by Cowie and Mackin for the original ODCIE, but take differing stances on the infor-mation that needs to be provided. As an example, BBI covers several phrasal verbs as collocations in their own right, while OCD only seems to cover them if they have collocations of their own. The two dictionar-ies are described in more detail below.

1.11.1 BBI

BBI (Benson, Benson & Ilson 1997) claims to cover 90,000 collocations for 18,000 entries in its 386 pages. It makes a clear distinction between grammatical and lexical collocations:

A grammatical collocation is a phrase consisting of a dominant word (noun, adjective, verb) and a preposition or grammatical structure such as an infinitive or clause. (p. xv)

Lexical collocations, in contrast to grammatical collocations, nor-mally do not contain prepositions, infinitives or clauses. Typical lexical collocations consist of nouns, verbs, adjectives and adverbs. (p. xxx)

Entries in BBI show both lexical and grammatical collocations, where applicable, and in the case of verbs show which pattern each grammatical collocation illustrates. Grammatical collocations are divided into eight

Copyrighted material – 978–1–403–94613–3

Copyrighted material – 978–1–403–94613–3

30 Collocation

major types, of which G1 to G4 have a noun as the dominant word, G5 to G7 an adjective and G8 a verb. G8 is subdivided into 19 verb patterns, labelled A to S (pp. xvi–xxix). Lexical collocations are divided into seven major types (pp. xxx–xxxv). Within each entry lexical col-locations come first, and type G8 grammatical collocations are labelled with their verb patterns.

As examples of the information given for collocations in BBI, here are the entries for the noun foresight (p. 102) and for curve as a verb (p. 87):

foresight n. the ~ to + inf. (he had the ~ to provide for the education of his childrenCurve II v. 1. to ~ sharply 2. (D; intr.) to ~ to (to ~ to the right) 3. (P; intr.) the missile ~d through the air

In the foresight entry the only collocation provided is grammatical. In the entry for curve, collocation 1 is lexical, while 2 and 3 are grammati-cal and follow intransitive versions of verb patterns D and P. The ‘swung bar’ (~) is used to replace the entry word throughout the collocations and examples in both entries. The examples for each entry are given above in parentheses.

In keeping with its policy of showing phrasal verbs, BBI has entries for make away, make believe, make do, make off, make out, make over and make up in addition to make by itself, both as a verb and a noun (p. 205).

1.11.2 OCD

In its 892 pages of dictionary text OCD claims to deal with 150,000 collocations relating to 9000 nouns, verbs and adjectives. This suggests much more detailed coverage of each entry. The entries for the noun foresight (p. 323) and for curve as a verb (p. 185) corresponding to those cited for BBI are:

foresight noun

ADJ considerable, greatVERB + FORESIGHT have He had the foresight to bring in the washing before the rain started | show The plans showed great foresight. | lackPHRASES a lack of foresight

••

Copyrighted material – 978–1–403–94613–3

Copyrighted material – 978–1–403–94613–3

The concept of collocation 31

curve verb

ADV. gently, slightly a gently curving stream | away, down, up The path curved down towards the village.PREP. around/round, towards, etc. The road curved away round the back of the hill

Both entries in OCD contain several more collocates than BBI, and both occupy considerably more space because of the larger number of usage examples given and the fact that they are given in full. On the other hand, the grammatical collocations the foresight to and to curve to given in BBI are not covered, although the latter could be implied from the ‘etc.’ given under prepositional collocations for curve. Lexical and grammatical collocations are not dealt with as explicit categories in OCD, although they could be derived from the allocation of word-class to the groups of collocates. In general, as can be seen in these examples, it seems to place more emphasis on lexical collocations than is the case in BBI.

The stated difference in policy on the inclusion of phrasal verbs between BBI and OCD is also clear from an examination of OCD. There is no entry in it for any sense of the word make, nor for any of the phrasal verbs containing it. The only entry containing make is that for make-up used as a noun.

1.12 Summary

In this general survey of the history of the word collocation and its use by linguists we have seen it move inexorably from non-linguistic origins to the sidelines of linguistics in general, and from there to the more particular study of lexis and semantics.

The compilers of the dictionaries described in the previous section recognised collocation as an element of language which needed clear and appropriate documentation. This inevitably raises the question of the role played by collocation within language theory. The develop-ment of this role is considered in Chapter 2.

Copyrighted material – 978–1–403–94613–3

Copyrighted material – 978–1–403–94613–3

250

Index of names

Akerlof, G. 153Andreou, C. 152Androutsopoulos, I. 137Aretoulaki, M. 137Arppe, A. 167

Bahumaid, S. 131–2Bailey, R. W. 5Baker, M. 130–1Barfield, A. 122–3, 125Barnbrook, G. 47, 50, 64,

100, 156Barton, J. 8, 12Bazell, C. E. 41Benson, E. 28–9, 103–4Benson, M. 28–9, 103–4Benzineb, K. 141Berglund, Y. 69Berry-Rogghe, G. L. M. 139Bullokar, J. 5

Cawdrey, R. 16, 22Chang, J. S. 138Chang, Y.-C. 138Chen, H.-C. 138Chomsky, N. 34, 37, 48, 122Church, K. 139Clear, J. 46–7, 109–10Cockeram, H. 5Coles, E. 5Cowie, A. P. 23–4, 27–9, 104,

106–8Crisp, R. J. 154–5Crowther, J. 28Cruden, A. 13–15, 94–9, 204,

207–10, 213Cruse, A. 42

Daley, R. 44–5, 89Danielsson, P. 165, 167–8Dante 16, 174Dignen, S. 28Dunning, T. 69

Estes, Z. 155Evert, S. 64, 69, 139

Falquet, G. 141Firth, J. R. 11–12, 33–9, 41–2, 51,

78, 95, 147–8, 164Forrester, R. 154Fowler, H. W. 19, 24Fox, G. 46, 110Francis, G. 173, 175Friedman, R. B. 155

Gatenby, E. V. 27, 104–6Gimson, A. C. 106–7Glosser, G. 155Goldberg, A. 173Granger, S. 117, 123, 125Gries, S. 165–6Groom, N. 123Gross, M. 166–7, 173Grossman, M. 155Grugan, P. K. 155Guyot, J. 141Gyllstad, H. 122, 125

Halliday, M.A.K. 38–43, 147–52Hanks, P. 47, 110Harris, Z. S. 166Hasan, R. 39–41, 147, 151–2Hill, J. 122, 125–6Hoey, M. 147, 154–5, 164Hoffmann, S. 69Homer, 16Hornby, A. S. 23, 27, 33–4, 104–8Hunston, S. 173Hutchins, J. 138

Ilson, R. 28–9

J.K.[Kersey, J.] 16Jackson, P. 17, 132–3, 135, 139,

141–2Jacobs, P. S. 140

Copyrighted material – 978–1–403–94613–3

Copyrighted material – 978–1–403–94613–3

Index of names 251

Johnson, S. 6, 9–11, 16, 22, 28, 35, 95–6, 99–100, 120, 198

Jones, S. 44–5, 89, 155Jurafsky, D. 132–3, 135–7, 154

Kilgarriff, A. 89, 173Kim, A. 155Kitamura, M. 139Klavans, J. L. 140Krenn, B. 139Krishnamurthy, R. K. 44, 46, 110,

150, 156Kupiec, J. 139

Lea, D. 28Lear, E. 37Lee, J. H. 69, 155Leech, G. 41–2Lewis, M. 106, 120–3, 125–6, 128Lin, D. 139, 141Liou, H.-C. 138Louw, B. 147, 157, 162Lyons, J. 41–2

Mackin, R. 27–9, 32–3, 35Manning, C. 64, 133, 135–6Martin, J. H. 132–3, 135–7, 154Mason, O. 88, 168–9, 227Matsumoto, Y. 138–9McCarthy, M. 120, 125, 127McIntosh, A. 38–9Melamed, I. D. 139Miangah, T. M. 139Milton 16, 21, 22Mitkov, R. 138Moon, R. 46, 109Morley, C. 126Moulinier, I. 132–3, 135, 139, 141–2Muir, J. 152Mulhern, B. 154Murray, J. A. H. 10

na gCopaleen, M. 20Nerima, L. 139Novick, J. M. 155

O’Dell, F. 120, 127O’Nolan, K. 20–1

Ogden, C. K. 36

Palmer, H. E. 23–6, 34–5, 96Partridge, E. 20–1Pawley, A. 123–4Phillips, E. 5

Redfern, W. 21Renouf, A. 46, 109Richards, I. A. 36Roget, 40Rundell, M. 117

Saussure, F. de 34, 51Schütze, H. 64, 133, 135–6Scott, M. 84, 201Seretan, V. 139Shakespeare 16Shannon, C. 67, 198Sinclair, J. M. 35–6, 41–51, 80, 89,

109, 147, 161–2, 164–5, 167Smadja, F. A. 139, 164Smith, N. 69, 191Somers, H. 138Southey, R. 7–8, 13Strzalkowski, T. 140Stubbs, M. 147, 156–7, 159, 161Summers, D. 111Surowiecki, J. 152–3Sweet, H. 23–4Swinburne 41Syder, F. H. 123–4Sylvester, J. 7

Tennyson, A, Lord 16Thornbury, S. 129Thornwell, E. 17–18Tognini-Bonelli, E. 51Trager, G. L. 12Trueswell, J. C. 155Turner, R. N. 154Tzoukerman, E. 140

Wakefield, A. H. 27, 104–6Wehrli, E. 139West, M. 23, 194White, M. D. 152, 200Wu, D. 139

Copyrighted material – 978–1–403–94613–3

Copyrighted material – 978–1–403–94613–3

252

Subject index

abstract categories 166–7algorithm 71, 78, 165, 168–9, 172ambiguity 135–6, 138, 221, 224Arabic 130–2ASCII 56authoritarianism 16authorship 45automatic summarisation 119awk 55, 57, 61–2, 65–6, 68, 79– 84,

169, 226–7, 229–31, 233–8

Bank of English 46, 50, 93–4, 99, 110, 115, 149, 157, 161–3, 176–8, 182, 186, 189–90, 197, 204, 207–8, 210–14

Bible, the 4, 13–15, 94–9, 204, 210bigram 172black-tops 188British National Corpus 32, 56–7, 59,

61, 111–12, 168–9

C++ 55, 79Cambridge International

Corpus 113–15case conversion 86Chambers International Corpus

115–16cliché 19, 21cluster 43, 89COBUILD 16, 45–7, 50, 80, 96,

109–10, 118, 120cohesion 40–1, 147, 151–3colligation 87collocability 12, 36–7, 39combination 28, 34, 77, 93–4,

102, 107, 115, 120, 133, 176, 221, 224

competence 27, 32, 122compositionality 42computational analysis 119computational linguistics 132concordance 4, 12–16, 46–7, 51, 58,

61, 64, 83–4, 94–7, 135, 157–8,

160, 172, 178, 182, 190, 197, 199, 201, 208, 210, 213, 215, 229

construction grammar 165, 173contextuality 36contingency tables 68co-occurrence 40, 48–9, 97, 172cultural attitudes 94

decoding 42, 104dictionaries 3–7, 9–12, 19–35, 46–50,

80, 96, 99–120, 128–9, 134, 159, 180, 189

comprehensive 6, 13, 16–17, 21, 23, 42, 118, 131, 139

didactic 22hard word 5, 22monolingual 5, 16, 22, 23, 104,

132, 139digital signal processing 88, 169disambiguation 4, 14, 16, 45, 94,

118, 135–7

EFL 23, 103, 138empirical 88, 133, 135encoding 42, 46, 104environment 4, 14, 40, 47, 58–65,

69, 80, 90, 99, 123, 164, 166, 227

etiquette 4, 18

fields 21, 87, 142, 237fixed combinations 93fluency 17, 123, 128formula 19, 64, 66, 69frames 169frequency 9, 40, 46, 56–70, 73, 76,

83, 88, 93, 114, 117, 122, 157, 168, 176, 183, 186, 197, 204, 210, 226–39

GENIA 134genre 112

Copyrighted material – 978–1–403–94613–3

Copyrighted material – 978–1–403–94613–3

Subject index 253

grammar 16, 24, 34, 39–40, 43, 114, 122, 128, 133, 136, 147, 154, 167, 172

hapax legomena 63, 171Head-driven Phrase Structure

Grammar 49heavyweights 186, 188, 197hermeneutics 13holism 42homographs 44–5, 87human language technology 132

idiom principle 48–51, 147, 165Idioms 21, 28, 32, 105index 4, 13, 15, 69, 128, 136, 142,

239information extraction 133, 141information retrieval 119, 133, 136,

140, 150intuition 32, 176

Java 55, 79

knowledge discovery 133KWIC 57

language acquisition 42, 120, 125, 129, 164

language analysis 4, 47, 133language teaching 23, 38, 93, 120,

125, 128, 143, 150langue 34, 51Laplace smoothing 58learner corpus 123lemma 74, 87, 157, 159, 161lemmatising 87lexical priming 147, 154lexical sets 38, 149lexical units 165lexicalized sentence stem 124lexicography 11, 16, 46, 93–120,

128, 164, 172lexis 3, 16, 31, 34, 38, 40–51, 122,

125, 129, 147, 154, 172linguistic insecurity 16linguistic theory 4, 34, 47, 119, 123,

147, 172literary technique 8

local grammar 47log-likelihood 64, 69, 76–7,

226, 235Longman Corpus Network 111Longman Dictionary of Contemporary

English 110Longman/Lancaster Corpus 111

machine translation 38, 132, 136Macmillan English Dictionary 116–17meaning 3–16, 26–9, 33–9, 41–51,

89, 94–110, 114–19, 123, 131, 137, 162–7, 172, 209–13

multi-word units 48, 129, 168–71, 226, 238

mutual information 47

National Centre for Text Mining 134, 142

natural language generation 137natural language processing 119,

132, 142 172, 186n-grams 168–9node 43, 57, 69, 80, 83, 87, 149,

162–8, 172, 182–7, 195, 205, 226, 234

noun phrase recognisers 133

open-choice principle 49, 123, 147Optimality Theory 49optional elements 166OSTI Report 44, 46Oxford Advanced Learner’s

Dictionary 106, 112Oxford English Dictionary 3, 6–7,

10–14, 19, 24, 35, 96, 100–3, 112, 154, 174, 189

Oxford Hachette French Dictionary 118

paradigmatic 34, 39, 164parallel corpora 138parameters 79, 88, 90, 166, 236parole 34, 51part-of-speech tagging 87pattern grammar 11, 24, 27, 29, 38,

49, 51, 157, 173, 177pedagogic 22–8, 35, 95 103, 119Philological Society 10phrasal verbs 28

Copyrighted material – 978–1–403–94613–3

Copyrighted material – 978–1–403–94613–3

254 Subject index

phrase 7, 18–21, 24–6, 29, 32–7, 42, 48–50, 62, 78, 83, 97–100, 103–9, 114, 117, 122, 126, 129, 133, 136, 162, 165, 174, 185, 189, 195, 197, 204

phraseology 17, 33, 80, 96, 104, 147, 164–67

phrasicon 123, 125positional variation 166precision 87, 140prescriptivism 16, 19program 55, 57, 61, 65, 84, 88, 226,

229programming 55punctuation 19, 56

quantitative analysis 147, 156quotation 6–7, 10, 99, 175

rationalist 133recall 140red-tops 186register 113, 117, 153, 177

sed 56, 84, 87, 227semantic profiles 156semantic prosody 87, 100, 147, 161semantics 12, 31, 34–9, 41, 48, 87,

147, 162significance 3–4, 10, 35, 42, 47, 61,

64–8, 71, 89, 109, 120, 155, 168, 172, 178, 207, 227, 229, 236

Sketch Engine 115, 173slot and filler 35, 48span 43, 57, 79–84, 88, 93, 148, 162,

168, 170, 182, 190, 215, 225–38speech and language processing

132, 135speech recognition 132, 137, 154stemmers 133

stochastic 133, 136style guides 4substitutability 38symbolic 133, 135synchronic 34, 41synonyms 15, 42, 115, 151syntactic dependency 172syntagmatic 34, 37, 39, 80syntax 16, 19, 37, 48, 55, 104, 235synthesis 132, 137

taxonomy 45teacher training 120, 128teaching 22, 26, 32, 34, 103,

119–21, 125–9text categorisation 140text mining 133, 141text types 113, 186textual theory 11, 13The Advanced Learner’s Dictionary of

Current English 104thesaurus 38, 40threshold 58, 62, 79, 229–35tokenisation 56, 87transformational grammar 48translation 6, 8, 130–2t-score 47, 64, 66, 71–8, 82, 93, 149,

159–63, 170, 179, 182–7, 195, 202–9, 226, 236

Unix 56, 86, 226usage notes 100

vocabulary 4, 16, 19, 23, 43, 48, 105, 112, 114, 121, 128

word behaviour 4, 35wordform 86, 149, 197, 203, 236

z-score 64, 66, 71–8, 84, 226, 236

Copyrighted material – 978–1–403–94613–3

Copyrighted material – 978–1–403–94613–3