Upload
azi-masihzadeh
View
220
Download
0
Embed Size (px)
Citation preview
8/3/2019 Collocation and Knowledge Production in an Academic Discourse
1/8
238
Collocation and knowledge production in an academic discourse
community
Keith Stuart
Ana Botella Trelis
Universidad Politcnica de Valencia (Spain)
Abstract
This paper analyses the discourse of science and technology through the study of lexical and
grammatical co-selection in research articles. The corpus comprises 1,376 articles, from specialist
leading journals (a total of 6,104,323 tokens, 71,516 types, and 1.17 type/token ratio). The main
criterion for choosing these journal articles for our corpus is that they are written by the members
of the academic discourse community of the Universidad Politcnica de Valencia.
The research follows on work by Gledhill (2000) who analysed the most frequent collocations in a
corpus of pharmaceutical research articles and proposed possible functions for these collocations.However, this paper takes a more lexico-grammatical approach to collocations as a system of
preferred expressions of knowledge in scientific research. The concept of collocation used here is
in the Hallidayan tradition as an intermediate level between syntax and lexis, which focuses on
recurrent word patterns (Hunston & Francis, 2000). The papers ultimate objective is to show that
collocations as a system of preferred expressions of knowledge in scientific research can help us to
analyse knowledge produced in our academic discourse community.
Key words: corpus linguistics, co-selection, collocation, colligation, research articles
Introduction
An important discovery of corpus linguistics has been that there is a level of
syntagmatic phrasal organisation, which had been largely ignored. These may be
described as n-grams to mean a recurrent string of uninterrupted word-forms or, as in
Scott (1997), they are called word clusters. They form part of what has been
denominated in the Firthian tradition as collocation. The reason why evidence from
corpora was needed is because these syntagmatic structures did not fit into either lexis
or grammar and because they involve facts about frequency which depends on computer
technology. As Leech (1992: 106) envisaged, the computer was going to do more than
just act as a research tool; it was going to open up new ways of thinking about language
by providing more data and better counting. Clear (1993: 274) pointed out that the use
of computational (algorithmic and statistical) methods has lead to a difference of scalein the corpus data that can be analysed and this in turn has led to a qualitative difference
in observations about language based on corpus evidence.
This paper explores corpus evidence about collocations as a first step towards
establishing a conceptual map through collocational networks of the knowledge being
produced in our academic discourse community. This paper analyses the discourse of
science and technology through the study of lexical and grammatical co-selection in
research articles in a corpus comprising of 1,376 articles (a total of 6,104,323 tokens).
The main criteria for choosing these journals for our corpus was the fact that they are
cited in the Science Citation Index (SCI), they are read by our university lecturers and
students, and it is where our lecturers and postgraduate students try to publish their
research. All the articles have been written by our lecturers and, therefore, represent the
back to contents
http://../contents.pdfhttp://../contents.pdf8/3/2019 Collocation and Knowledge Production in an Academic Discourse
2/8
239
work of a single academic discourse community, in this case, the Universidad
Politcnica de Valencia.
The research follows on work by Gledhill (2000) where he analyses the most frequent
collocations in a corpus of pharmaceutical research articles and proposes possible
functions for these collocations. However, this paper takes a more lexico-grammaticalapproach to collocations as a system of preferred expressions of knowledge in scientific
research. We do not though restrict the analysis to a strict collocational approach but
rather investigate the notion of co-selection asdescribing the general phenomenon of
words that habitually keep company, to paraphrase Firth. A syntagmatic view of
language takes account of the contribution of sense and syntax to meaning. The
argument that sense and syntax (Sinclair, 1991), or meaning and pattern (Hunston
& Francis, 2000), are associated is based on two pieces of evidence. Firstly, meanings
tend to be distinguished by differing patterns, and secondly, words with the same
pattern sometimes share aspects of meaning.
Sinclair (1991: 170) refers to collocation as the occurrence of two or more wordswithin a short space of each other in a text; this could logically refer to co-selection
between lexical or grammatical items. Some authors (Firth, 1957; Hoey, 2005: 43) draw
a distinction between collocation and colligation, using the former to refer to the co-
occurrence of lexical items and the latter to the interrelationship of words and
grammatical items (the grammatical company a word or word sequence keeps).
Sinclair himself refers to colligation within a collocation context, in terms of
collocational frameworks, which are units based on a grammatical, as opposed to a
lexical, core (e.g., the/an...of) (Renouf & Sinclair, 1991: 128-143). Analysis of lexical
and grammatical co-selection in our corpus of research articles proceeded by asking
three questions:
What are the collocations of X word or words in the corpus? What meanings do X word or words tend to associate with? What grammatical constructions (colligation) do X word or words tend to
enter into?
The papers ultimate objective is to analyse collocations as a system of preferred
expressions of knowledge in scientific research that can help us to analyse knowledge
produced in our academic discourse community.
Method
Once the corpus had been designed and implemented, we proceeded to analyse the data
by creating wordlists of technical and semi-technical terms through frequency counts
and keyword identification. This process involved initially comparing a general English
wordlist (from the 100 million BNC corpus) with a wordlist from our corpus.
Frequencies were compared and a keyword list was created from our corpus. To
compute the "key-ness" of an item, the software (WordSmith) used computes the
following and cross-tabulates them (Scott, 2004):
its frequency in the smaller wordlist (our corpus) the number of running words in the smaller wordlist (our corpus)
8/3/2019 Collocation and Knowledge Production in an Academic Discourse
3/8
240
its frequency in the reference corpus (BNC) the number of running words in the reference corpus (BNC)
Once we had established the candidate terms to be analysed, our software started
extracting collocations for these terms and dumped them into an Excel spreadsheet.
Collocates of terms are extracted from the entire corpus within the span of 5 words both
sides of the node term.
The candidate terms selected for this analysis were the following:
1.- Semi-technical words which are very frequent in the corpus and constitute
significant examples of both lexical and grammatical co-occurrences (collocations and
colligations), for example, results, system, model, etc.
2.- Semi-technical and technical words which tend to appear next to or near certain
terms producing relevant semantic content which represents knowledge generated at our
Institution.
Results
The first example we would like to present in this paper is the term results , as it is the
most frequent semi-technical term in the UPV corpus (9,730 times). Moreover, this term
gives us clear examples of lexical associations not only for three-word recurrent patterns
(clusters) but also if we look at longer strings. It is worth mentioning the fact that the
most frequent collocation found, the results obtained, is followed by different
prepositions (depending on the noun group that follows the preposition).
TABLE 1. Obtained as a collocate of results
the resultsobtained 636
the resultsobtained for 98
the resultsobtained with 97
the resultsobtainedby 82
the resultsobtained in 82
the resultsobtainedfrom 56
Collocates for the term results fall into three categories: evaluative adjectives(experimental, similar, good, different, previous), past participle adjectives/passive
structures (obtained, shown, presented, compared), active verbs: show, indicate, present.
Position of terms with respect to the node before or after it is clearly fixed in some of
the examples and, consequently, relevant in those cases.
TABLE 2. Most frequent collocates of results
with Total Total Left Total Right
obtained results 1457 191 1266
experimental results 627 558 69show results 591 90 501
8/3/2019 Collocation and Knowledge Production in an Academic Discourse
4/8
241
discussion results 526 66 460
shown results 426 82 344
presented results 284 48 236
similar results 277 199 78
good results 251 170 81
different results 236 104 132
simulation results 217 174 43
compared results 201 68 133
between results 194 136 58
indicate results 186 8 178
agreement results 185 112 73
analysis results 176 96 80
observed results 156 96 60
given results 149 37 112
present results 149 101 48
previous results 149 88 61
It may be also especially worth mentioning that 5-word clusters with results are
different from those shown above.
TABLE 3. 5 -word clusters with results
in agreement with the results 20
often leads to misleading results 12
according to the results obtained 10
basic notions and preliminary results 9
the basis of the results 9
taking into account the results 9
on the basis of the results 9
The collocates and clusters found for the same word in the singular (result) differ
substantially from those found in the plural form.
TABLE 4. 3-word clusters for result
as a result 329
the following result 282
the resultof 250
a resultof 228
System is the second semi-technical word in frequency in the corpus (8,205). Theresults obtained when analysing the term show two facts we would like to mention.
8/3/2019 Collocation and Knowledge Production in an Academic Discourse
5/8
8/3/2019 Collocation and Knowledge Production in an Academic Discourse
6/8
243
Other noun groups, although less statistically frequent, are formed with this semi-
technical word: the traditional pile salting method, the discrete analytical stiffness
derivative method, proposed shape restricted snake method, the split step Fourier
method, etc.
Another term which shows a fixed pattern of use is performance, being theperformance of the most frequent cluster found (494). This term usually collocates in
our corpus with other words with a positive meaning: evaluative adjectives such as
best, better than, high, good and with verbs that have positive connotations such as
improve, boost, achieve.
The semi-technical term temperature tends tocolligate with the preposition, at (the
pattern at room temperature is the most frequent: 484 times). Clusters with semantic
content are also found with this term: glass transition temperature, the annealing
temperature, cooling water temperature, burnt products temperature.
Other examples of semi-technical words in the corpus which show fixed lexical andgrammatical patterns are samples and values. In the case of samples, we find a
repeated use of passive structures with verbs indicating actions performed by scientists
in this context.
TABLE 8. Passive structures with sample
samples were taken 50
samples were prepared 31
samples were analysed 23
samples stored at 23
samples treated with 20
samples were dried 19
samples were analyzed 18
Both value and values are used in recurrent combinations with prepositions. We find
patterns such as: of a/the X value of, for a/the value of, with the value/s of, with
different values of, in the value of, from the values of.
These examples with value are similar to Sinclairs collocational frameworks (Renouf
& Sinclair, 1991). One of the most common collocational frameworks in our corpus is
the string: preposition+the+x+of+y. The most frequent examples found for the
preposition inare: in the case of (1,256), in the presence of (957), in the absence
of (268), in the range of (141), etc. The results for on are: on the basis of (219),
on the use of (124), on the surface of (95), etc. With at we have: at the beginning
of (94), at the bottom of (92), at the centre of (90), at the end of (80). Another
collocational framework which is very frequent in the scientific writing of our corpus is:
under x conditions. Examples found in the corpus are: under these conditions, under
the conditions, under certain conditions, under different conditions, under non-
cavitating conditions, under super critical conditions, etc. This kind of analysis can
constitute a useful resource for scientists writing in English as L2 and for ESP teachersand their students.
8/3/2019 Collocation and Knowledge Production in an Academic Discourse
7/8
244
Semi-technical terms with a high degree of frequency in the corpus provide us with
information about the knowledge in our community. The association of these terms with
other technical terms and the relationships established between some of them will help
the linguist to represent knowledge in terms of more or less complex semantic
networks.
FIGURE 1.First step towards a collocational network with acid
FIGURE 2. First step towards a collocational network with sites
Concentration
(49)
Zeolites (50)
Frameworks (50)
Strong (56)
Surface (60)
Strength (63)
Binding (91)
Number (110)
Active (174)
Acid (416)
SITES 1,460
Cinnamic (77)
Membrane (88)
Catalysts (94)
Acetic (97)
PH (104)
Coumaric (114) Solution (118)
Groups (136)
Strength (129)
Citric (185)
Concentration
(211)
Amino (379)Sites (416)
ACID 4,083
8/3/2019 Collocation and Knowledge Production in an Academic Discourse
8/8