Collocation and Knowledge Production in an Academic Discourse

Embed Size (px)

Citation preview

  • 8/3/2019 Collocation and Knowledge Production in an Academic Discourse

    1/8

    238

    Collocation and knowledge production in an academic discourse

    community

    Keith Stuart

    Ana Botella Trelis

    Universidad Politcnica de Valencia (Spain)

    Abstract

    This paper analyses the discourse of science and technology through the study of lexical and

    grammatical co-selection in research articles. The corpus comprises 1,376 articles, from specialist

    leading journals (a total of 6,104,323 tokens, 71,516 types, and 1.17 type/token ratio). The main

    criterion for choosing these journal articles for our corpus is that they are written by the members

    of the academic discourse community of the Universidad Politcnica de Valencia.

    The research follows on work by Gledhill (2000) who analysed the most frequent collocations in a

    corpus of pharmaceutical research articles and proposed possible functions for these collocations.However, this paper takes a more lexico-grammatical approach to collocations as a system of

    preferred expressions of knowledge in scientific research. The concept of collocation used here is

    in the Hallidayan tradition as an intermediate level between syntax and lexis, which focuses on

    recurrent word patterns (Hunston & Francis, 2000). The papers ultimate objective is to show that

    collocations as a system of preferred expressions of knowledge in scientific research can help us to

    analyse knowledge produced in our academic discourse community.

    Key words: corpus linguistics, co-selection, collocation, colligation, research articles

    Introduction

    An important discovery of corpus linguistics has been that there is a level of

    syntagmatic phrasal organisation, which had been largely ignored. These may be

    described as n-grams to mean a recurrent string of uninterrupted word-forms or, as in

    Scott (1997), they are called word clusters. They form part of what has been

    denominated in the Firthian tradition as collocation. The reason why evidence from

    corpora was needed is because these syntagmatic structures did not fit into either lexis

    or grammar and because they involve facts about frequency which depends on computer

    technology. As Leech (1992: 106) envisaged, the computer was going to do more than

    just act as a research tool; it was going to open up new ways of thinking about language

    by providing more data and better counting. Clear (1993: 274) pointed out that the use

    of computational (algorithmic and statistical) methods has lead to a difference of scalein the corpus data that can be analysed and this in turn has led to a qualitative difference

    in observations about language based on corpus evidence.

    This paper explores corpus evidence about collocations as a first step towards

    establishing a conceptual map through collocational networks of the knowledge being

    produced in our academic discourse community. This paper analyses the discourse of

    science and technology through the study of lexical and grammatical co-selection in

    research articles in a corpus comprising of 1,376 articles (a total of 6,104,323 tokens).

    The main criteria for choosing these journals for our corpus was the fact that they are

    cited in the Science Citation Index (SCI), they are read by our university lecturers and

    students, and it is where our lecturers and postgraduate students try to publish their

    research. All the articles have been written by our lecturers and, therefore, represent the

    back to contents

    http://../contents.pdfhttp://../contents.pdf
  • 8/3/2019 Collocation and Knowledge Production in an Academic Discourse

    2/8

    239

    work of a single academic discourse community, in this case, the Universidad

    Politcnica de Valencia.

    The research follows on work by Gledhill (2000) where he analyses the most frequent

    collocations in a corpus of pharmaceutical research articles and proposes possible

    functions for these collocations. However, this paper takes a more lexico-grammaticalapproach to collocations as a system of preferred expressions of knowledge in scientific

    research. We do not though restrict the analysis to a strict collocational approach but

    rather investigate the notion of co-selection asdescribing the general phenomenon of

    words that habitually keep company, to paraphrase Firth. A syntagmatic view of

    language takes account of the contribution of sense and syntax to meaning. The

    argument that sense and syntax (Sinclair, 1991), or meaning and pattern (Hunston

    & Francis, 2000), are associated is based on two pieces of evidence. Firstly, meanings

    tend to be distinguished by differing patterns, and secondly, words with the same

    pattern sometimes share aspects of meaning.

    Sinclair (1991: 170) refers to collocation as the occurrence of two or more wordswithin a short space of each other in a text; this could logically refer to co-selection

    between lexical or grammatical items. Some authors (Firth, 1957; Hoey, 2005: 43) draw

    a distinction between collocation and colligation, using the former to refer to the co-

    occurrence of lexical items and the latter to the interrelationship of words and

    grammatical items (the grammatical company a word or word sequence keeps).

    Sinclair himself refers to colligation within a collocation context, in terms of

    collocational frameworks, which are units based on a grammatical, as opposed to a

    lexical, core (e.g., the/an...of) (Renouf & Sinclair, 1991: 128-143). Analysis of lexical

    and grammatical co-selection in our corpus of research articles proceeded by asking

    three questions:

    What are the collocations of X word or words in the corpus? What meanings do X word or words tend to associate with? What grammatical constructions (colligation) do X word or words tend to

    enter into?

    The papers ultimate objective is to analyse collocations as a system of preferred

    expressions of knowledge in scientific research that can help us to analyse knowledge

    produced in our academic discourse community.

    Method

    Once the corpus had been designed and implemented, we proceeded to analyse the data

    by creating wordlists of technical and semi-technical terms through frequency counts

    and keyword identification. This process involved initially comparing a general English

    wordlist (from the 100 million BNC corpus) with a wordlist from our corpus.

    Frequencies were compared and a keyword list was created from our corpus. To

    compute the "key-ness" of an item, the software (WordSmith) used computes the

    following and cross-tabulates them (Scott, 2004):

    its frequency in the smaller wordlist (our corpus) the number of running words in the smaller wordlist (our corpus)

  • 8/3/2019 Collocation and Knowledge Production in an Academic Discourse

    3/8

    240

    its frequency in the reference corpus (BNC) the number of running words in the reference corpus (BNC)

    Once we had established the candidate terms to be analysed, our software started

    extracting collocations for these terms and dumped them into an Excel spreadsheet.

    Collocates of terms are extracted from the entire corpus within the span of 5 words both

    sides of the node term.

    The candidate terms selected for this analysis were the following:

    1.- Semi-technical words which are very frequent in the corpus and constitute

    significant examples of both lexical and grammatical co-occurrences (collocations and

    colligations), for example, results, system, model, etc.

    2.- Semi-technical and technical words which tend to appear next to or near certain

    terms producing relevant semantic content which represents knowledge generated at our

    Institution.

    Results

    The first example we would like to present in this paper is the term results , as it is the

    most frequent semi-technical term in the UPV corpus (9,730 times). Moreover, this term

    gives us clear examples of lexical associations not only for three-word recurrent patterns

    (clusters) but also if we look at longer strings. It is worth mentioning the fact that the

    most frequent collocation found, the results obtained, is followed by different

    prepositions (depending on the noun group that follows the preposition).

    TABLE 1. Obtained as a collocate of results

    the resultsobtained 636

    the resultsobtained for 98

    the resultsobtained with 97

    the resultsobtainedby 82

    the resultsobtained in 82

    the resultsobtainedfrom 56

    Collocates for the term results fall into three categories: evaluative adjectives(experimental, similar, good, different, previous), past participle adjectives/passive

    structures (obtained, shown, presented, compared), active verbs: show, indicate, present.

    Position of terms with respect to the node before or after it is clearly fixed in some of

    the examples and, consequently, relevant in those cases.

    TABLE 2. Most frequent collocates of results

    with Total Total Left Total Right

    obtained results 1457 191 1266

    experimental results 627 558 69show results 591 90 501

  • 8/3/2019 Collocation and Knowledge Production in an Academic Discourse

    4/8

    241

    discussion results 526 66 460

    shown results 426 82 344

    presented results 284 48 236

    similar results 277 199 78

    good results 251 170 81

    different results 236 104 132

    simulation results 217 174 43

    compared results 201 68 133

    between results 194 136 58

    indicate results 186 8 178

    agreement results 185 112 73

    analysis results 176 96 80

    observed results 156 96 60

    given results 149 37 112

    present results 149 101 48

    previous results 149 88 61

    It may be also especially worth mentioning that 5-word clusters with results are

    different from those shown above.

    TABLE 3. 5 -word clusters with results

    in agreement with the results 20

    often leads to misleading results 12

    according to the results obtained 10

    basic notions and preliminary results 9

    the basis of the results 9

    taking into account the results 9

    on the basis of the results 9

    The collocates and clusters found for the same word in the singular (result) differ

    substantially from those found in the plural form.

    TABLE 4. 3-word clusters for result

    as a result 329

    the following result 282

    the resultof 250

    a resultof 228

    System is the second semi-technical word in frequency in the corpus (8,205). Theresults obtained when analysing the term show two facts we would like to mention.

  • 8/3/2019 Collocation and Knowledge Production in an Academic Discourse

    5/8

  • 8/3/2019 Collocation and Knowledge Production in an Academic Discourse

    6/8

    243

    Other noun groups, although less statistically frequent, are formed with this semi-

    technical word: the traditional pile salting method, the discrete analytical stiffness

    derivative method, proposed shape restricted snake method, the split step Fourier

    method, etc.

    Another term which shows a fixed pattern of use is performance, being theperformance of the most frequent cluster found (494). This term usually collocates in

    our corpus with other words with a positive meaning: evaluative adjectives such as

    best, better than, high, good and with verbs that have positive connotations such as

    improve, boost, achieve.

    The semi-technical term temperature tends tocolligate with the preposition, at (the

    pattern at room temperature is the most frequent: 484 times). Clusters with semantic

    content are also found with this term: glass transition temperature, the annealing

    temperature, cooling water temperature, burnt products temperature.

    Other examples of semi-technical words in the corpus which show fixed lexical andgrammatical patterns are samples and values. In the case of samples, we find a

    repeated use of passive structures with verbs indicating actions performed by scientists

    in this context.

    TABLE 8. Passive structures with sample

    samples were taken 50

    samples were prepared 31

    samples were analysed 23

    samples stored at 23

    samples treated with 20

    samples were dried 19

    samples were analyzed 18

    Both value and values are used in recurrent combinations with prepositions. We find

    patterns such as: of a/the X value of, for a/the value of, with the value/s of, with

    different values of, in the value of, from the values of.

    These examples with value are similar to Sinclairs collocational frameworks (Renouf

    & Sinclair, 1991). One of the most common collocational frameworks in our corpus is

    the string: preposition+the+x+of+y. The most frequent examples found for the

    preposition inare: in the case of (1,256), in the presence of (957), in the absence

    of (268), in the range of (141), etc. The results for on are: on the basis of (219),

    on the use of (124), on the surface of (95), etc. With at we have: at the beginning

    of (94), at the bottom of (92), at the centre of (90), at the end of (80). Another

    collocational framework which is very frequent in the scientific writing of our corpus is:

    under x conditions. Examples found in the corpus are: under these conditions, under

    the conditions, under certain conditions, under different conditions, under non-

    cavitating conditions, under super critical conditions, etc. This kind of analysis can

    constitute a useful resource for scientists writing in English as L2 and for ESP teachersand their students.

  • 8/3/2019 Collocation and Knowledge Production in an Academic Discourse

    7/8

    244

    Semi-technical terms with a high degree of frequency in the corpus provide us with

    information about the knowledge in our community. The association of these terms with

    other technical terms and the relationships established between some of them will help

    the linguist to represent knowledge in terms of more or less complex semantic

    networks.

    FIGURE 1.First step towards a collocational network with acid

    FIGURE 2. First step towards a collocational network with sites

    Concentration

    (49)

    Zeolites (50)

    Frameworks (50)

    Strong (56)

    Surface (60)

    Strength (63)

    Binding (91)

    Number (110)

    Active (174)

    Acid (416)

    SITES 1,460

    Cinnamic (77)

    Membrane (88)

    Catalysts (94)

    Acetic (97)

    PH (104)

    Coumaric (114) Solution (118)

    Groups (136)

    Strength (129)

    Citric (185)

    Concentration

    (211)

    Amino (379)Sites (416)

    ACID 4,083

  • 8/3/2019 Collocation and Knowledge Production in an Academic Discourse

    8/8