15
Jointly published by AkadØmiai Kiad, Budapest Scientometrics, and Springer, Dordrecht Vol. 63, No. 3 (2005) 567581 Received December 13, 2004 Address for correspondence: JONATHAN ADAMS Evidence Ltd., 103 Clarendon Road, Leeds LS2 9DF, UK E-mail: [email protected] 01389130/US $ 20.00 Copyright ' 2005 AkadØmiai Kiad, Budapest All rights reserved Early citation counts correlate with accumulated impact JONATHAN ADAMS Evidence Ltd., Leeds (UK) The present paper addresses the objective of developing forward indicators of research performance using bibliometric information on the UK science base. Most research indicators rely primarily on historical time series relating to inputs to, activity within and outputs from the research system. Policy makers wish to be able to monitor changing research profiles in a more timely fashion, the better to determine where new investment is having the greatest effect. Initial (e.g. 12 months from publication) citation counts might be useful as a forward indicator of the long-term (e.g. 10 years from publication) quality of research publications, but although there is literature on citation-time functions no study to evaluate this specifically has been carried out by Thomson ISI or any other analysts. Here, I describe the outcomes of a preliminary study to explore these citation relationships, drawing on the UK National Citation Report held by Evidence Ltd under licence from Thomson ISI for OST policy use. Annual citation counts typically peak at around the third year after publication. I show that there is a statistically highly significant correlation between initial (years 12) and later (years 310) citations in six research categories across the life and physical sciences. The relationship holds over a wide range of initial citation counts. Papers that attract more than a definable but field dependent threshold of citations in the initial period after publication are usually among the top 1% (the most highly cited papers) for their field and year. Some papers may take off slowly but can later join the high impact group. It is important to recognise that the statistical relationship is applicable to groups of publications. The citation profiles of individual articles may be quite different. Nonetheless, it seems reasonable to conclude that leading indicators of research excellence could be developed. This initial study should now be extended across a wider range fields to test the initial outcomes: earlier papers suggest the model holds in economics. Additional statistical tests should be applied to explore and model the relationship between initial, later and total citation counts and thus to create a general tool for policy application. Introduction How quickly can managers tell whether new funding, new structures and new initiatives have led to improvements in research? There are many ways of analysing research performance retrospectively but there are few early indicators that allow any prospective inferences to be drawn about emerging research quality.

Early citation counts correlate with accumulated impact

Embed Size (px)

Citation preview

Page 1: Early citation counts correlate with accumulated impact

Jointly published by Akadémiai Kiadó, Budapest Scientometrics,and Springer, Dordrecht Vol. 63, No. 3 (2005) 567–581

Received December 13, 2004Address for correspondence:JONATHAN ADAMSEvidence Ltd., 103 Clarendon Road, Leeds LS2 9DF, UKE-mail: [email protected]

0138–9130/US $ 20.00Copyright © 2005 Akadémiai Kiadó, BudapestAll rights reserved

Early citation counts correlatewith accumulated impact

JONATHAN ADAMS

Evidence Ltd., Leeds (UK)

The present paper addresses the objective of developing forward indicators of researchperformance using bibliometric information on the UK science base.

Most research indicators rely primarily on historical time series relating to inputs to, activitywithin and outputs from the research system. Policy makers wish to be able to monitor changingresearch profiles in a more timely fashion, the better to determine where new investment is havingthe greatest effect. Initial (e.g. 12 months from publication) citation counts might be useful as aforward indicator of the long-term (e.g. 10 years from publication) quality of research publications,but – although there is literature on citation-time functions – no study to evaluate this specificallyhas been carried out by Thomson ISI or any other analysts.

Here, I describe the outcomes of a preliminary study to explore these citation relationships,drawing on the UK National Citation Report held by Evidence Ltd under licence from ThomsonISI for OST policy use. Annual citation counts typically peak at around the third year afterpublication. I show that there is a statistically highly significant correlation between initial (years1–2) and later (years 3–10) citations in six research categories across the life and physical sciences.The relationship holds over a wide range of initial citation counts. Papers that attract more than adefinable but field dependent threshold of citations in the initial period after publication are usuallyamong the top 1% (the most highly cited papers) for their field and year. Some papers may take offslowly but can later join the high impact group.

It is important to recognise that the statistical relationship is applicable to groups ofpublications. The citation profiles of individual articles may be quite different. Nonetheless, itseems reasonable to conclude that leading indicators of research excellence could be developed.This initial study should now be extended across a wider range fields to test the initial outcomes:earlier papers suggest the model holds in economics. Additional statistical tests should be appliedto explore and model the relationship between initial, later and total citation counts and thus tocreate a general tool for policy application.

Introduction

How quickly can managers tell whether new funding, new structures and newinitiatives have led to improvements in research? There are many ways of analysingresearch performance retrospectively but there are few early indicators that allow anyprospective inferences to be drawn about emerging research quality.

Page 2: Early citation counts correlate with accumulated impact

J. ADAMS: Early citation counts

568 Scientometrics 63 (2005)

This is a management challenge, because it is widely and often asserted that it is notpossible until after, perhaps well after, investment has been completed to determinewhether that investment has been of value in stimulating or supporting researchexcellence. By then any management reaction is pointless. This problem arises becausemost research indicators rely primarily on historical time series relating to inputs to,activity within and outputs from the research system. Information is drawn fromnational and international sources that produce collated data only after thoroughchecking and reconciliation. Such data are updated only on an annual cycle and derived,secondary indicators therefore lag actual activity.

Policy makers wish to be able to monitor changing research profiles in a moretimely fashion. The only data that are updated sufficiently frequently for such a purposeare Thomson ISI data on publications and citations. Fortunately, such data are preciselythose that are most likely to give information on research quality because they measureimpact.

No conclusive study has yet been carried out by ISI or any other analysts todetermine whether initial (e.g. 12 months from publication) citation counts aresufficiently useful as a forward indicator of the long-term (e.g. 10 years frompublication) quality of research publications. This has, however, been an issue ofinterest over a long period (see e.g. DE SOLLA PRICE, 1965) and a number of studiesnow emerging demonstrate the significance of papers that have high initial citation rates(sometimes described as hot-topic or fast-breaking papers, see SMALL, 2004). There isalso a significant body of relevant literature on citation-time distributions (OROMANER,1983; GLÄNZEL & SCHOEPFLIN, 1995; EGGHE & RAO, 2001) and functions (GLÄNZEL,1997; VLACHY, 1985 and references therein).

OROMANER (1983) suggested that economics papers with the ‘greatest repute’received the most citations during early and late periods after publication. GLÄNZEL &SCHOEPFLIN (1995) noted some significant differences in aging and reception betweennatural and social science fields. They concluded that impact factors based onobservations of the first two years after publication may be a dubious guide toperformance.

Here, we describe the outcomes of a preliminary study to explore these citationrelationships, drawing on the UK National Citation Report held by Evidence Ltd underlicence from Thomson ISI for policy use by the UK Office of Science and Technology(OST, 2004; KING, 2004). Even partial indicators would be of value in gaining someinformation on whether a particular pulse of investment is producing a reaction. In fact,other work on programmes funded by UK public sector agencies has shown thatmanagement intervention – such as a targeted programme of hypothecated grant aid fora stated area of research – can produce a change. A positive reaction profile mightfollow these steps:

Page 3: Early citation counts correlate with accumulated impact

J. ADAMS: Early citation counts

Scientometrics 63 (2005) 569

• New funding draws researchers into the area leading to an increase inactivity;

• This leads to an increase in training at postgraduate and postdoctoral level,leading to a legacy of increased capacity through more specialist staff;

• There is an increase in outputs because of the expanded research base;• Citation averages tend to rise, possibly because objectives are better

defined, there is an increase in awareness and interest, better qualityresearch is selectively supported and better quality people are working onthe topic;

• There is a longer term sustained level of activity after the funding pulse hasfinished.

These results and this ideal profile are all based on data gathered and analysedtypically 3–5 years after programme completion and up to 10 years after programmeinitiation. They do suggest, however, that there should be some changes that couldpotentially be detectable at an early stage when the first steps are taken on the path tolonger-term change in successful areas of intervention.

There is pressure to identify and evaluate rapid response metrics, because policymakers would like to be able to determine the trajectory of change in response tofunding investment at an early stage. While it seems unlikely that such metrics couldconceivably encompass the full value range of outcomes from any programme, theymight indicate where there is particular promise that would merit additional support.

This preliminary study was performed to determine whether or not the citation countaccumulated in the first period after publication (e.g. 12–24 months) is a good indicatorof long-term citation impact compared to other papers in the same field published at thesame time. The study has been carried out on a limited set of publications in selectedfields and any outcomes will require more extensive evaluation – at the level of sub-fields, individual journals and across countries – to challenge their general validity.

Methods

For an initial evaluation, it would be counter-productive to monitor either the totalglobal output in a discipline or a total country output. Reasons for avoiding such ananalysis are not only the scale of the task but also the likely diversity in citationaccumulation patterns between the different national and disciplinary cultures. Thesewould obscure any underlying patterns.

It was therefore decided to tackle only select areas, defined by the standard ISIfields, and only to analyse data published with a UK correspondence address. This wasmost likely to produce homogeneous datasets which would allow citation profiles to be

Page 4: Early citation counts correlate with accumulated impact

J. ADAMS: Early citation counts

570 Scientometrics 63 (2005)

discernable. Testable hypotheses could then be developed from these initial, simplemodels, leading to the development of useful tools for generic policy application.

The most suitable data for developing leading or forward indicators arebibliometrics – journal articles and their citation information. These are collated rapidlyand released frequently thus providing the basis for timely analysis. The data sourceused for this exploration of is the UK National Citation Report 2002 (the NCR). Thiswas a dataset licensed from Thomson ISI by the UK Office of Science and Technology(OST) as part of the development of its Public Service Agreement (PSA) targetindicators for UK science and engineering base.

The NCR is a database of the UK’s journal articles in the sciences, social sciences,and humanities, drawn from data indexed by ISI. The UK dataset available under theOST contract covers 1993–2002 and contains all standard bibliographic information foreach paper indexed plus year-by-year and total citation counts for each paper. There is alinked file of expected citation rates for each paper, based on the journal title,publication year and document type (article, review, editorial, etc.).

We initially sampled one life science area that we believed would be particularlyappropriate for analysis, having high citation rates and a large UK community. Thesesamples confirmed that our initial expectations were sensible. We then identified a setof additional research areas, defined by journal sets, and added two other life sciencesubjects and three physical science subjects. The final set of test groups was therefore:

• Biochemistry and biophysics (BIL)• Molecular biology and genetics (MBG)• Pharmacology and toxicology (PHM)• Optics and acoustics (O/A)• Physical Chemistry (PHC)• Space Science (SP)

We identified and extracted data records for all UK papers published within thesesubject categories during 1993. These are papers for which we have the longest run ofannual citation counts.

It was not feasible to get an exactly identical citation time period for each paper,because citations were grouped in the NCR by year of citation. Within any onepublication year there were therefore some papers published late in the year that onlygot a short exposure within the same year while others published early in the yearwould have a longer window of initial exposure.

There are several possible citation counts for each paper that are useful for thepurposes of this analysis. First, there is a question of the best definition of InitialCitations. Because there are papers published early and late in a calendar year, wedecided to explore two windows for “initial” citations: (i) the year of publication aloneand (ii) the year of publication plus a following period. Second, there are Later

Page 5: Early citation counts correlate with accumulated impact

J. ADAMS: Early citation counts

Scientometrics 63 (2005) 571

Citations, outside the initial window, but not including initial citations. Obviously therewill be various alternative counts of later citations for any paper, each depending on thecount of initial citations that is used. Third, there are Total Citations including bothinitial and later citations.

For the six subject areas, a graphical check on the annual citation profile for paperspublished in 1993 revealed a typical peak at around the third year (1995). This is shownin detail in the Results section. This suggested that if ‘initial’ counts of 1993 paperscovered 1993+1994 then this would dilute the problem of early/late 1993 withoutincorporating too great a part of a paper’s shelf-life.

We created citation profiles for each paper within the six subject categories. Foreach paper we

Calculated the citation count by year between 1993 and 2002(1a) Identified the short initial citation count for 1993 (the first year, but this will be

only be the first few months for papers published late in year)(1b) Aggregated the long initial citation counts for 1993–1994 (to get a minimum of

twelve months’ citations for every paper)Aggregated the citation counts by year for subsequent periods

• 1994–2002 (later citations for 1993 initial counts) = 2a• 1995–2002 (later citations for 1993–1994 initial citation counts) = 2b• 1993–2002 (the full ten year run) = (3)

For these papers published in the UK during 1993, we then created the followingstatistics:

Total count of papers within subject categoryCorrelation 1993 vs 1994–2002 (short initial vs later, 1a vs 2a)Correlation 1993–1994 vs 1995–2002 (long initial vs later, 1b vs 2b)Correlation 1993 vs 1993–2002 (short initial vs total, 1a vs 3)Correlation 1993–1994 vs 1993–2002 (long initial vs total, 1b vs 3)In this study, the (product-moment) correlation coefficient of initial vs. later and

initial vs. total citations has been calculated purely to provide a preliminary indicator ofthe underlying patterns. The initial and total data are not independent and therefore itwould strictly be necessary to apply non-parametric analyses, but the number of datapoints means that a non-parametric analysis (which would be onerous to apply) isunlikely to produce a significantly different result.

Thus, for a preliminary study, the parametric correlation is an acceptable dipstickindicator although not sufficient as a basis for longer term policy decisions. For futuredevelopments, it would be desirable to carry out some kind of autoregression on theyear by year profile of citation accumulation. There are also other approaches toanalysis where we are using the “baseline” (i.e. 0–24 months cites) to predict

Page 6: Early citation counts correlate with accumulated impact

J. ADAMS: Early citation counts

572 Scientometrics 63 (2005)

“outcome” (120 months cites). This is a non-simple problem – it is known as“mathematical coupling” – and examples in the biological literature show that it canlead to spurious results if incorrectly applied. In simple terms, one wishes to define arelationship: outcome = function (outcome-baseline). The key question is whether it ispossible to quantify the ‘function’ and whether that function can be used to predict‘outcome’ with minimal residual variance. Citation data exhibit classic mathematicalcoupling. There are solutions, however, and advice from professional sources suggeststhat they are not too technical to be incorporated into later developments.

Finally, in this study, analyses of the relationship between early and late citationcounts were made by ranking papers on either initial or on later citations and thendetermining what their ranked position would be for the contrasting period. For ease ofunderstanding, these data are also shown graphically.

Results

Citation profiles

Before developing detailed statistical analyses it is valuable to consider the actualscatter of data and see whether this suggests any other considerations or interpretation.It is also necessary to see whether there is any indication that outcomes would besusceptible to misinterpretation using early citation counts as an indicator. In addition, itis desirable to determine whether the data profile is sufficiently convincing for thisapproach to be acceptable for a wider and probably sceptical audience.

Figures 1 and 2 show the citation profiles (i.e. the annual citation count) of the 1993papers published in the UK within the six subject categories. Within all subjectcategories there is a large initial growth period of citations during the first 2 years afterpublication (1993–1995). The annual citation count tends to peak in 1995. After thisyear, there is a steady decline in annual citations to 2002. Similar patterns are reportedfor other fields by GLÄNZEL et al. (2004).

The life science categories (BIL, MBG and PHM: Figure 1) all share a similaryearly citation profile. They initially have a 65–75% increase between 1993 and theirpeak in 1995, followed by a progressive decrease in annual citations between 1995 and2002. The 2002 citation figures for these subject categories are between 50–60% of thepeak 1995 citation figure.

Page 7: Early citation counts correlate with accumulated impact

J. ADAMS: Early citation counts

Scientometrics 63 (2005) 573

Figure 1. Annual citation counts 1993–2002 for UK papers published in 1993 in three life science fields.

The rate of decline year on year for physical science categories (O/A, PHC and SP:Figure 2) is significantly less after the initial peak in citations in 1995 than for the lifesciences. The decrease in citations from 1995–2002 between 15–25% leaving the actualcitation figure in 2002 at around 80% of the 1995 peak figure.

Figure 2. Annual citation counts 1993–2002 for UK papers published in 1993 in three physical science fields.

Page 8: Early citation counts correlate with accumulated impact

J. ADAMS: Early citation counts

574 Scientometrics 63 (2005)

There are other quirks in the data, none of which affect the overall outcome. Spacescience sees a late resurgence from 1998 with 4401 citations to 5057 citations in 1999.Optics and acoustics only has an initial increase in citations of 26% between 1993 and1995 but the citation profile is then sustained through to 2002 (85% of 1995 level ofcitations achieved in 2002).

Correlation

Table 1 and Figure 3 show the product moment correlation coefficients betweenearly period citation rates (long initial, 1993–1994) and the more long term period(199x–2002) citation counts, thereby providing an indication of the relationshipbetween initial citations and citations achieved during later years after publication.

All these coefficients are very highly significant (P << 0.001). Figure 3 shows howthe coefficients change according to the field and to the time periods compared.

Table 1. Product moment correlation coefficients in correlation analyses for initial andlater citation counts of UK journal publications in select science fields

Number ofarticles

1993–1994& 1993–2002

1993–1994& 1994–2002

1993–1994& 1995–2002

Biochemistry and biophysics 2077 0.957 0.953 0.940Molecular biology and genetics 1348 0.839 0.828 0.780Pharmacology and toxicology 1700 0.705 0.698 0.632Optics and acoustics 443 0.681 0.673 0.617Physical chemistry 1767 0.653 0.650 0.634Space science 923 0.708 0.693 0.650

Figure 3. Product moment correlation coefficients in correlation analyses for initial and later citation counts ofUK journal publications in select science fields

Page 9: Early citation counts correlate with accumulated impact

J. ADAMS: Early citation counts

Scientometrics 63 (2005) 575

There is relatively little difference between the comparisons. The coefficient is leastfor the comparison of initial vs. later where later does not include any part of or overlapwith initial. This is to be expected. Even so, the relative change is marginal.

Distributions of citations

The strong correlations suggest that papers with many initial citations are those withhigh total citation counts. This general relationship provides a good basis for using theinitial citation count as an indicator, but it is desirable also to understand how muchvariation there is between early and late performance.

Table 2 summarises the initial and total citation counts for the different groups ofpapers. A citation growth function can be calculated from this, expressing the totalcitations (1993–2002) relative to the initial citation count (1993–1994). The citationgrowth function in the life sciences was typically around 6–8, but was rather greater inthe physical sciences as a consequence of the sustained citation accumulation beyondthe peak in the third year. The citation growth function was just over 8 on average forthe physical sciences, but was much greater (14.3% to 21.7%) for a group of highlyranked papers.

Citation counts were created for three groups: the total papers in each category; thetop 1% of papers ranked by initial citation count; and the top 1% of papers ranked bytotal citation count. The average number of citations in the initial period and in the totalperiod was calculated for all papers and for the two ranked groups (e.g. for the highestranked 1% for 1993–1994, the citation count for both 1993–1994 and 1993–2002 wascalculated).

For the life sciences, the highest ranked papers typically acquired 12.5 to 17.3 timesas many citations as the average for the group as a whole. The initial top 1% were notidentical to the overall top 1% but their citation count relative to the rest was similar.For the physical sciences, there was greater variation in relative position of the leadingpapers. The range of ratios between the top 1% and the group as a whole was quite wide(4.6 to 20.5). Furthermore, while the top 1% among initial citation counts broadlymaintained their position, the overall top 1% made much greater gains suggesting thatsome took a significant period to gain recognition. This fits with the higher citationgrowth function of these papers.

In summary, the analyses in Table 2 confirm that, for both life and physicalsciences, the most highly ranked papers initially will remain amongst the higher rankedpapers on average, but that in the physical sciences there is a ‘slow burn’ factor thatmeans that some papers with high total impact may initially gain less recognition. Theseare important cultural differences between these communities and would affect policyuse of these indicators.

Page 10: Early citation counts correlate with accumulated impact

J. ADAMS: Early citation counts

576 Scientometrics 63 (2005)

Table 2. Average initial and total citation counts for UK papers published in 1993 in 6 ISI categories(in table: top 1% = most highly cited 1% of papers ranked by citation count in period;

citation growth function = ratio between initial and total citations)ISI category code BIL BIL BIL MBG MBG MBG PHM PHM PHMNumber UK articles 93-02 2077 1348 1700

Period1993–1994

1993–2002

Citationgrowth

function1993–1994

1993–2002

Citationgrowthfunction

1993–1994

1993–2002

Citationgrowthfunction

Average cites- for all papers 4.8 37.3 7.8 6.0 41.8 6.9 2.3 19.8 8.6- for top 1% 1993–1994 82.6 544.2 6.6 100.3 603.0 6.0 37.8 251.6 6.6- for top 1% 1993 to 2002 75.5 610.7 8.1 94.4 683.4 7.2 28.7 343.6 12.0Ratio of citations- top at 2002 vs all papers 15.8 16.4 15.7 16.3 12.5 17.3- top at 1993–1994 top vs all 17.2 14.6 16.6 14.4 16.4 12.7

ISI category code O/A O/A O/A PHC PHC PHC SP SP SPNumber UK articles1993–2002 443 1767 923

Period1993–1994

1993–2002

Citationgrowth

function1993–1994

1993–2002

Citationgrowthfunction

1993–1994

1993–2002

Citationgrowthfunction

Average cites- for all papers 1.4 12.3 8.5 2.1 18.3 8.7 3.4 27.9 8.2- for top 1% 1993–1994 11.0 87.4 7.9 21.2 329.6 15.5 29.7 222.9 7.5- for top 1% 1993 to 2002 6.6 123.0 18.6 17.3 374.2 21.7 25.2 360.3 14.3Ratio of citations- top at 2002 vs all papers 4.6 10.0 8.3 20.5 7.5 12.9- top at 1993–1994 top vs all 7.6 7.1 10.2 18.1 8.8 8.0

Figures 4 and 5 are scatter-plots where each point is the data for an individual paper,with the number of initial citations for the period 1993–1994 on the horizontal axis andthe number of later citations 1995–2002 on the vertical axis. This is important: the dataon the two axes are independent.

These scatters show that some papers have a slow start but later do well, as noted forthe physical sciences in Table 2. However, the general trend (reflected in the highlysignificant correlation coefficients in Table 1) is that papers that do relatively well inattracting citations initially continue to do very well later. It is possible to start to putsome figures on the category-specific function that predicts total citation count frominitial citation count.

Page 11: Early citation counts correlate with accumulated impact

J. ADAMS: Early citation counts

Scientometrics 63 (2005) 577

Figure 4a-c. Comparisons between initial (1993–1994) and later (1995–2002) citation counts for UK paperspublished in 1993 in three life science fields

Page 12: Early citation counts correlate with accumulated impact

J. ADAMS: Early citation counts

578 Scientometrics 63 (2005)

Figure 5a-c. Comparisons between initial (1993–1994) and later (1995–2002) citation counts for UK paperspublished in 1993 in three physical science fields

Page 13: Early citation counts correlate with accumulated impact

J. ADAMS: Early citation counts

Scientometrics 63 (2005) 579

The scatter for the life science categories shows the following:• Biochemistry – papers that got more than 25 citations initially got more

than 100 citations later and there is a general trend that covers a highoutlier; 17 of the top 1% of papers (21 papers) ranked by total cites in 2002received more than 30 citations by end 1994

• Molecular biology – papers that got more than 50 citations initially gotmore than 170 citations later and 12 of the top 14 papers (1%) are in thisgroup.

• Pharmacology – most papers that received more than 20 citations initiallyreceived more than 100 citations later

The scatter for the physical science categories shows the following:• Optics – papers that received more than 8 citations initially received more

than 25 citations later• Physical chemistry – papers that received more than 20 citations initially

received more than 50 citations later• Space science – most papers that received more than 20 citations initially

received more than 80 citations later

Discussion

The relationships between initial, later and total citation counts analysed in thispaper indicate that across reasonably large samples of research publications (notindividual papers) it is possible to use initial citation counts predictively to indexemerging quality relative to the field. Furthermore, for the six different life and physicalsciences fields analysed here, those papers that scored above a definable threshold in thefirst period will subsequently be among the most highly cited (top 1%) papers for theirfield and year.

The results in Table 1 confirm the hypothesis that early citations are statistically agood guide to later and thus to total citations. That is to say, a high initial citation countfor a paper relative to the average for other papers in the same field is a reasonableguide to the long term significance (impact) of that paper. The analysis is sensitive tothe comparison made, however, and is less highly significant for initial vs later citationsthan for initial vs total citations. This should, incidentally, throw some further light onthe issue of ageing of information. Because these analyses suggest that initially morehighly cited papers are on average more likely to be cited later they also suggest thatthese papers will continue to attract citations at a significant rate over a long period, sothe ageing rate of any group of papers may actually be modal according to initialcitation performance. High initial ‘take-off’ produces a longer potential trajectory. Forall these relationships, it must be strongly emphasised that the described relationshipsare statistical, not definitive; individual papers will have a diversity of performance.

Page 14: Early citation counts correlate with accumulated impact

J. ADAMS: Early citation counts

580 Scientometrics 63 (2005)

This relationship holds for six fields in the life and physical sciences, although it isevidently very strong in specific life science fields and the correlations are generallybetter in life sciences than physical sciences. In biochemistry, the correlation isexceptional given the volume of data. Physical chemistry is also a large grouping(Table 1) and although the correlation is not as outstanding as in biochemistry it is stillvery high. The general pattern is still maintained in the smaller optics/acousticscategory. In due course it would be desirable, however, to evaluate the scalerelationship and to determine whether it continues to hold for sub-fields and forindividual journals.

Biochemistry is a fast moving field in which work is read and used very rapidly andcitation rates are high. It is therefore particularly susceptible and sensitive to this type ofanalysis. Optics is rather different. As a source of research innovation, it is important toa number of areas of application and rapid technology development, but it is a smallerfield with characteristically lower citation growth rates.

The pattern of citation accumulation varies between the life and physical sciences.Although in both areas there is a rapid growth in the first two-three years, counts thendrop off rapidly in life sciences (Figure 1) whereas they are more stable in the physicalsciences (Figure 2). This affects the comparison between early and late citation tallies(Table 2, Figures 4 and 5), because the bulk of the citation life of many life sciencepapers will be in that initial period, so there will be less data for the later period. Initialand later counts for the physical sciences may represent different phases of a paper’slife. The fact that the correlations are valid across all fields despite this culturaldifference adds further support to the overall model.

The applicability of initial citation counts as a useful tool for policy purposes islikely to be field dependent. It seems to work very well for the life sciences, but in thephysical sciences there is evidence that while high initial citation counts are maintained,some high total citation counts only ‘take off’ slowly. The narrower the field of study,the more risk there would be that an early analysis might fail to pick up emergingexcellence. However, OROMANER (1983) also suggests that the pattern is observable ineconomics, which must add support to the hypothesis that there is a general model atthe coarse level.

There would rightly be some disagreement about the application of this approach tosingle papers, and it should not be recommended for that purpose. The later citationhistory of papers that initially have high citation scores is that they continue to be highimpact papers in their category, even if their initial tally is set aside. The later citationhistory of papers that initially have low citation scores is more variable, with somepapers only showing their promise after some delay. In other words, evidence of highimpact based on initial citations counts (‘hot topics’ in Thomson ISI parlance) is likelyto be soundly based and subsequently validated, whereas lack of that evidence shouldnot condemn a field, a programme or a researcher to obscurity.

Page 15: Early citation counts correlate with accumulated impact

J. ADAMS: Early citation counts

Scientometrics 63 (2005) 581

In summary: this study shows that for UK papers at a field level, evidence ofincreased impact relative to the rest of the field soon after publication is likely to beamenable for use as an indicator of improved performance. Since the metrics haveproved to be useful, the longer-term objectives that might be tackled would includebroadening the scope of the work to test:

• Whether the pattern is common to most research fields (as OROMANER(1983) suggests) or only holds in core science areas, where citation indicesare generally agreed to be a particularly good measure of quality.

• What the detailed statistical characteristics are of citation accumulationpatterns in different fields and how these might influence interpretation.

• What specific thresholds might be set for exceptional early citation countsin different fields.

• Whether the UK pattern is replicated in comparator countries.

*

This paper has been much improved by the suggestions of two anonymous referees, who highlightedimportant cross-references and caveats. I am also grateful to Stuart Marshall and Tom Letcher (Evidence Ltd)for data development, Henry Small, Nancy Bayers and David Pendlebury (Thomson ISI) for discussionsabout the relatedness of early and late citations, and David Humphry and Rob Stone (OST) for asking difficultquestions about the functionality of bibliometric indicators. The work reported here was carried out as part ofthe development of annual performance indicators for the UK science and engineering base, funded by theUK Office of Science and Technology.

References

GLÄNZEL, W. (1997), On the possibility and reliability of predictions based on stochastic citation processes.Scientometrics, 40 : 481–492.

GLÄNZEL, W., SCHOEPFLIN, U. (1995), A bibliometric study on aging and reception processes of scientificliterature. Journal of Information Science, 21 : 37–53.

GLÄNZEL, W., THIJS, B., SCHLEMMER, B. (2004), A bibliometric approach to the role of author self-citationsin scientific communication. Scientometrics, 59 : 63–77.

KING, D. A. (2004), The scientific impact of nations. Nature, 430 : 311–316.OROMANER, M. (1983), Professional standing and the reception of contributions to economics. Research in

Higher Education, 19 : 351–362.OST (2004), PSA Target Metrics for the UK Research Base, a report commissioned from Evidence Ltd.,

http://www.ost.gov.uk/research/psa_target_metrics.htmPRICE, D. J. DE SOLLA (1965), Networks of scientific papers. Science, 149 : 510–515.SMALL, H. (2004), Why authors think their papers are highly cited. Scientometrics, 60 : 305–316.VLACHY, J. (1985), Citation histories of scientific publications. The data sources. Scientometrics, 7 : 505–528.