15

Click here to load reader

Calibrating the zoom — a test of Zitt’s hypothesis

Embed Size (px)

Citation preview

Page 1: Calibrating the zoom — a test of Zitt’s hypothesis

Jointly published by Akadémiai Kiadó, Budapest Scientometrics, Vol. 75, No. 1 (2008) 81–95 and Springer, Dordrecht DOI: 10.1007/s11192-007-1832-7

Received April 11, 2007

Address for correspondence: JONATHAN ADAMSEvidence Ltd., 103 Clarendon Road, Leeds LS2 9DF, UK E-mail: [email protected]

0138–9130/US $ 20.00 Copyright © 2008 Akadémiai Kiadó, Budapest All rights reserved

Calibrating the zoom – a test of Zitt’s hypothesis

JONATHAN ADAMS, KAREN GURNEY, LOUISE JACKSON

Evidence Ltd., Leeds (UK)

Bibliometric indicators are widely used to compare performance between units operating in different fields of science.

For cross-field comparisons, article citation rates have to be normalised to baseline values because citation practices vary between fields, in respect of timing and volume. Baseline citation values vary according to the level at which articles are aggregated (journal, sub-field, field). Consequently, the normalised citation performance of each research unit will depend on the level of aggregation, or ‘zoom’, that was used when the baselines were calculated.

Here, we calculate the citation performance of UK research units for each of three levels of article-aggregation. We then compare this with the grade awarded to that unit by external peer review. We find that the correlation between average normalised citation impact and peer-reviewed grade does indeed vary according to the selected level of zoom.

The possibility that the level of ‘zoom’ will affect our assessment of relative impact is an important insight. The fact that more than one view and hence more than one interpretation of performance might exist would need to be taken into account in any evaluation methodology. This is likely to be a serious challenge unless a reference indicator is available and will generally require any evaluation to be carried out at multiple levels for a reflective review.

Introduction

The relative value of any item depends on how the comparison is made, but is one valuation better or more ‘true’ than another? In this paper, we compare the bibliometric

Page 2: Calibrating the zoom — a test of Zitt’s hypothesis

J. ADAMS & AL.: Calibrating the zoom – a test of Zitt’s hypothesis

82 Scientometrics 75 (2008)

impact of articles submitted by UK academics to the 2001 Research Assessment Exercise (RAE2001) with the Grade (of relative national or international quality) their units were awarded by peer review panels. We compare the different associations created by normalising the impact data relative to journal, relative to journal category and relative to research field.

Michel Zitt [ZITT & AL., 2006] has drawn attention to the possibility that the relative impact of a journal article will change according to the level at which any normalisation occurs. The possible effects of changing the reference point at which normalisation is made has also been noted by HIRST [1978] in relation to Discipline Impact Factors and methods for comparing bibliometric indicators across fields were reviewed by SCHUBERT & BRAUN [1993, 1996]. GLÄNZEL & MOED [2002] also commented on the effect of different levels of aggregation. KOSTOFF [2002] recognised the importance of selecting the appropriate group of papers against which to normalise citation rates. He suggested a (labour-intensive) method in which multiple technical experts select the set of baseline papers.

If citation rates vary when normalised against different, pre-defined, groupings of papers (e.g. journal, sub-field, field) then this calls into question the validity of normalisation methods that rely on data aggregated at an arbitrarily defined level. For example, PODLUBNY [2005] normalises paper citation counts relative to the total broad field citation count rather than journal or narrow field. ZITT [OP. CIT.] invokes the image of a bibliometric landscape across which the observer’s view may zoom in and out. At a high level of definition, an article may be significant for the journal in which it is published. Zooming out, the same article is of only modest relative impact within its specialism (where the specialism includes a number of journals with diverse citation rates). Such differences are evident in the analyses presented by VAN RAAN [2005]where citation rates for Chemistry research groups are normalised by two different methods; once at the journal level and once at field level. He goes on to calculate correlations between field level and H-index, and between these two and peer review, but does not compare field level with journal level.

Seen from an even more general viewpoint, the value of a published article may change again when placed in a broader field covering several other disciplines. ZITT [OP. CIT.] analysed the correlation between the normalised impact at different foci for the same sets of papers and found that the relationship was indeed very weak.

Following Zitt’s conclusions, we predict that at some levels of aggregation there will be a match – by which we mean a significant correlation – between average relative impact and another reference measure of research quality that does not depend solely on citation counts. Ideally this reference measure would be one that is also of broad acceptability to researchers, such as peer review. By varying the zoom we would find the level, or levels, where normalised citation indices matched peer review judgments and other levels where the correlation was absent, perhaps even negative. The levels of

Page 3: Calibrating the zoom — a test of Zitt’s hypothesis

J. ADAMS & AL.: Calibrating the zoom – a test of Zitt’s hypothesis

Scientometrics 75 (2008) 83

best match would provide an optimal scale of normalisation for research evaluation in that field, subject to any inherent biases in the reference indicator.

The UK’s RAE system provides us with a powerful reference indicator that can be used to explore and test Zitt’s insight. The RAE is based on a peer review process, structured by subject-based Units of Assessment (UoA). For each UoA, an expert panel meets several times to analyse a diverse mass of data on research funding, research student numbers and research outputs. They also review strategy statements. The panel eventually determines a grade for each university department that has made a submission in its stated UoA (subject). Because they rely on both a rich body of information and the informed judgment of an expert group, these grades provide a set of soundly based benchmarks to compare with any other metric. We can thus see how the relative impact for the publications of a university department at any grade changes according to the ‘zoom’ setting we use for normalising the citation data.

If we accept that the RAE grading is the most informative benchmark then the journal aggregation level at which the correlation between bibliometrics and peer review grade is greatest would be the ideal ‘zoom setting’ for capturing an informative picture. If such a relationship can be found – and work by MOED & AL. [1985] would suggest that it is far from certain – then this would suggest that research evaluation must take appropriate levels of data aggregation into account to be valid and effective. If we can establish such a relationship for one sample, we must also assess whether it is consistent between disciplines, and the extent to which it is equally applicable to sub-fields within a discipline.

Methodology

For the RAE in 2001, there were 68 Units of Assessment (UoA, subject categories). We chose to analyse the data for three of these. We selected the data for UoA13 Psychology, UoA14 Biological Sciences and UoA19 Physics, to evaluate subject differences. The units have the advantage of being data rich: there are many institutional submissions and most research outputs submitted for assessment were articles published in journals covered by Thomson Scientific databases. The three subject areas differ in research culture, providing a challenge mechanism to evaluate our results.

The RAE protocol requires each academic to submit up to four research outputs for assessment. It is assumed that they will normally submit examples of their highest quality work. The items may be journal articles, books and chapters, conference proceedings or other forms of original output. Since we are analysing citation impact, however, we are interested only in journal articles (rather than other forms of output) and we are interested only in those journals that are captured on Thomson Scientific databases. This reduces the volume of data available for analysis in some subject areas.

Page 4: Calibrating the zoom — a test of Zitt’s hypothesis

J. ADAMS & AL.: Calibrating the zoom – a test of Zitt’s hypothesis

84 Scientometrics 75 (2008)

We focussed our analysis on the material submitted by units subsequently awarded grade 4, 5 or 5* (on a scale from 1 to 5*). The formaldescription of each grade in the RAE2001 documentation is available at:

https://admin.hero.ac.uk/sites/hero/rae/Pubs/4_01/section3.htmand is described thus:

5* = International excellence in more than half of research activity submitted; national excellence in remainder.

5 = International excellence in up to half (11–50%) of research activity submitted; national excellence in virtually all (>90%) of remainder.

4 = National excellence in virtually all (>90%) of research activity submitted; some evidence of international excellence (c. 10%).

The reason for focussing on these top grades was that these were the ones most readily interpretable in terms of an external reference benchmark for citation impact (e.g. world average in Thomson data). Below grade 4, the more ‘national’ quality of the research activity was likely to influence the proportion of submitted material that was actually measurable in terms of citation impact and outliers might have a greater influence.

Source data on RAE research outputs – which must all be published within the specified census period from 1996–2000 for RAE2001 – are available online at http://www.hero.ac.uk/rae/. The summary statistics for the material which could be covered are shown in Table 1.

Table 1. Summary statistics describing the output data available for units graded 4, 5 and 5* at RAE2001

UoA13 UoA14 UoA19Psychology Biological

SciencesPhysics

Number of units awarded grade 4, 5 or 5* 46 58 43

(as a percentage of units submitting) (63%) (76%) (86%)

Total full-time equivalent numbers of 4, 5, 5* academic staff 991 2,159 1,602

Total outputs submitted for assessment 3,982 8,878 6,663

Total journal articles submitted for assessment 3,759 8,723 6,409

Count of submitted articles indexed by Thomson 3,532 7,905 5,492

(as a percentage of outputs submitted) (89%) (89%) (82%)

Page 5: Calibrating the zoom — a test of Zitt’s hypothesis

J. ADAMS & AL.: Calibrating the zoom – a test of Zitt’s hypothesis

Scientometrics 75 (2008) 85

For each UoA we followed the same procedures:

1. The outputs from those institutions awarded RAE grade 4, 5 or 5* were collated and sorted by output type. The records for journal articles (which are categorised by the RAE Manager as Output Type D) were then abstracted.

2. We compared the RAE article records to article records in the UK National Citation Report (UK-NCR), 2005 edition, from Thomson Scientific. We sought to match each item in the RAE database (some of which were duplicates) to a unique article record in the Thomson database. Matches were obtained using a combination of year, journal title, volume number and pagination.

3. The probable match was then checked via the article title. Some RAE items could not be matched, either because the journal was not in the UK-NCR or because the RAE record was invalid or had been incorrectly entered.

4. For those articles that were matched, we extracted the actual citation count and the expected citation rate (journal-normalisation factor) for the same journal and year.

5. From the Thomson Scientific National Science Indicators (NSI) database for 2005, we extracted the expected citation rate (Category-normalisation factor) for the subject Category and year to which the item was assigned in the UK-NCR database.

6. For each UoA, for each year, we calculated the expected citation rate (UoA-normalisation factor) by combining NSI data for the appropriate Thomson Current Contents journal categories. Elsewhere, we mapped Thomson data to UoA in order to create subject level benchmarks [ADAMS,1998; ADAMS & AL., 1998]. Our UoA-to-Thomson Category mappings are based on frequency of journal submissions to the 1996 UK RAE and the mappings for UoA 13, UoA 14 and UoA 19 are reproduced in Table 2. Lists of the journals assigned to each Thomson category are available at http://scientific.thomson.com/mjl/.

Note that although some RAE records are duplicates, where an article with several authors was submitted more than once, these records were not removed since they represent evidence of recorded impact for each submitted researcher.

The citation count for each RAE article matched to Thomson Scientific UK-NCR data can be normalised by year at (a) journal level, (b) Category level and (c) UoA level. These normalised impact factors can then be aggregated and averaged within each UoA by HE institution (college or university). The aggregated values enable impact (the distribution of normalised impact at journal-level, Category-level and UoA-level) to be

Page 6: Calibrating the zoom — a test of Zitt’s hypothesis

J. ADAMS & AL.: Calibrating the zoom – a test of Zitt’s hypothesis

86 Scientometrics 75 (2008)

directly compared to peer evaluation (the distribution of RAE Grades awarded after peer review by the expert panel).

Table 2. Mapping between Thomson Scientific categories and RAE Units of Assessment. Citation baselines are sample values for 1999, calculated as world average citations per paper taken

from National Science Indicators 2005

Unit of Assessment(UoA)

Mapped categories (Thomson Scientific® Current Contents) Citationbaseline

UoA13 Psychology 18.19

REHABILITATION 6.01PSYCHOLOGY 9.75PSYCHIATRY 17.53NEUROSCIENCES & BEHAVIOR 23.3

UoA14 Biological Sciences 18.60

ENTOMOLOGY/PEST CONTROL 6.12ANIMAL SCIENCES 6.76AQUATIC SCIENCES 9.56ENVIRONMENT/ECOLOGY 11.23BIOTECHNOLOGY & APPLIED MICROBIOLOGY 11.85PLANT SCIENCES 12.1BIOLOGY 15.11ANIMAL & PLANT SCIENCES 17.03EXPERIMENTAL BIOLOGY 18.14MICROBIOLOGY 19.13BIOCHEMISTRY & BIOPHYSICS 24.74MOLECULAR BIOLOGY & GENETICS 29.89CELL & DEVELOPMENTAL BIOLOGY 40.64

UoA19 Physics 10.12

NUCLEAR ENGINEERING 3.59INSTRUMENTATION & MEASUREMENT 6.09OPTICS & ACOUSTICS 6.33APPLIED PHYSICS/CONDENSED MATTER/MATERIALS SCIENCE 8.76SPECTROSCOPY/INSTRUMENTATION/ANALYTICAL SCIENCES 9.02PHYSICAL CHEMISTRY/CHEMICAL PHYSICS 11.27PHYSICS 11.78SPACE SCIENCE 18.67

Within a UoA, each institution will have an RAE Grade and three impact factors normalised at different levels. The Grades are determined in relation to peer perceptions of international quality. The impact factors are normalised in relation to world average for the category, but the averages differ according to the level of categorisation. As hypothesised in the Introduction, Grade and impact should be related.

Page 7: Calibrating the zoom — a test of Zitt’s hypothesis

J. ADAMS & AL.: Calibrating the zoom – a test of Zitt’s hypothesis

Scientometrics 75 (2008) 87

The units were ranked on Grade and on impact (both for impact normalised to Category and for impact normalised to UoA average). Spearman rank correlation was used to determine the significance level for association between normalised impact and RAE grade.

Results

The results of the analysis are clear. The level at which normalisation occurs profoundly affects our interpretation of relative performance (Table 3).

Table 3. Grade related bibliometric performance of material submitted to RAE2001

Number of units

Journal based

Category based

UoA based

Number of units

Journal based

Category based

UoA based

Number of units

Journal based

Category based

UoA based

Grade 4 17 1.22 1.40 0.80 17 1.29 2.35 1.89 15 1.28 1.84 1.98

Grade 5 17 1.18 1.80 1.05 30 1.11 2.33 2.33 23 1.47 2.51 2.96

Grade 5* 12 1.32 2.38 1.63 11 1.18 2.53 2.93 5 1.82 3.32 3.75

Grade at RAE2001

UoA19 PhysicsBibliometric average

UoA13 PsychologyBibliometric average

UoA14 Biological SciencesBibliometric average

The values given in the table are the institutional averages for the citation impact of journal articles submitted to the RAE and that were also found in the UK National Citation Report. Data are normalised by article at the level of the journals in which the articles were published, at the level of the Thomson categories to which the articles were assigned and at the level of the aggregate of journals in the broader research field (the UoA)

Psychology

For UoA13 Psychology we were able to analyse 3982 items across 46 units. The average impact of articles is similar across grades (and slightly better than world average) when it is compared to the expected citation rate of the journals in which they are published. In other words, the relative impact of articles from grade 4 units appears to be similar to, or covers a similar range to, that of articles from the more highly graded 5 and 5* units. The reason why these articles may be consistently above average for their journals is that each article is selected by the author for submission for assessment because it is representative of their best work. This is not a random sample of their typical work. On this basis, their selection appears to be logical and well-founded.

The picture changes when the impact of the same set of articles is normalised relative to the Thomson Category to which the article is assigned. Now it becomes apparent that the grade 5* units spread across a higher range and have a higher average impact than the grade 4 units. In fact, compared to the global Category average, the

Page 8: Calibrating the zoom — a test of Zitt’s hypothesis

J. ADAMS & AL.: Calibrating the zoom – a test of Zitt’s hypothesis

88 Scientometrics 75 (2008)

grade 5* units almost all lie above twice that average while the grade 4 units almost all lie below that value, but are still above the world average for the Category.

When the impact of the articles is normalised relative to UoA, the performance of the units varies again according to grade, and the performance relative to world average for each unit is considerably lower than when the impact is normalised relative to Category. The performance is lower because the UoA baseline is calculated using world totals for the Categories that map best to this UoA and in that set of articles there are relatively more articles in higher-impact categories than there are in the set of RAE articles submitted to the UoA.

So, when performance is rebased at the Thomson Category-level, a distinction between 4, 5 and 5* graded units does emerge. At the same time, virtually all the grade 4 units perform above world average. In other words, while there is a bibliometric differentiation between grades (unlike the journal-level normalisation) it puts all units above the peer estimation. However, when performance is rebased at the UoA-level, the outcome of the analysis confirms the grades awarded by peer review: now the 5* units are seen to be of predominantly international excellence while the grade 4 units show more limited evidence of international excellence.

Each point is the average citation count to the end of 2005 for the set of journal articles submitted by a stated institution within this UoA, with impact for each article normalised against a world average.

The data are grouped according to the RAE grade awarded to the institution

Figure 1. Variation in citation impact for Unit of Assessment 13 Psychology in the 2001 UK Research Assessment Exercise

Page 9: Calibrating the zoom — a test of Zitt’s hypothesis

J. ADAMS & AL.: Calibrating the zoom – a test of Zitt’s hypothesis

Scientometrics 75 (2008) 89

Biological sciences

For UoA14 Biological Sciences we were able to analyse 8878 items across 58 units. The average impact of articles is similar across grades when it is compared to the expected citation rate of the journals in which they are published. At this level of normalisation, grade 4s show the highest average (and contain one significant high outlier) and grade 5s the lowest. As with UoA13 Psychology, most of the units perform within the same range, regardless of grade, and the majority perform above world average when impact is normalised to journal average.

When impact is normalised to journal Category, grade 4s still perform slightly higher, on average, than Grade 5s, and there are now two high outliers amongst the grade 4 units. At this level of normalisation, units graded 5* have the highest average citation impact.

Each point is the average citation count to the end of 2005 for the set of journal articles submitted by a stated institution within this UoA, with impact for each article normalised against a world average. The data are

grouped according to the RAE grade awarded to the institution

Figure 2. Variation in citation impact for Unit of Assessment 14 Biological Sciences in the 2001 UK Research Assessment Exercise

Page 10: Calibrating the zoom — a test of Zitt’s hypothesis

J. ADAMS & AL.: Calibrating the zoom – a test of Zitt’s hypothesis

90 Scientometrics 75 (2008)

The picture changes once more when impact is normalised to UoA, so that the average performance of the units conforms more closely to the grades allocated by peer review: the range of values for the majority of grade 4 units is lower than for the other grades. Grade 5 and grade 5* units perform within the same (higher) range, but the average for the grade 5*s is higher than for grade 5s.

The ‘outlying’ unit that ranked second when article performance was normalised to Category average, drops to twenty-first place when the UoA baseline is used. The unit in question is a specialised institute, active in a narrow disciplinary range.

Physics

For UoA19 Physics we were able to analyse 6663 items across 43 units. Again, it is apparent that the average impact of articles is similar across grades when it is compared to the expected citation rate of the journals in which they are published. The average quality is nearly always above journal average, although there is an evident trend in the data towards higher relative article quality with higher grade. On the whole, however, the relative impact of articles from grade 4 units covers a similar range (between 1.0 and 2.0 of world average) compared to that of articles from the more highly graded 5 and 5* units.

This picture changes when the impact of these articles is normalised relative to journal Category. Now it becomes apparent that the grade 5* units spread across a higher range and have a higher average impact than the grade 4 units. In fact, the grade 5* units lie above almost all except two of the grade 4 units.

There is, however, a slightly more complex picture for the grade 5 units, which have a major cluster between the grade 4 and grade 5* ranges but also have an outlier above the grade 5* range, and another outlier below the grade 4 range. We have not explored this exception in detail, and made no correction for it in the statistical analysis, but a visual inspection suggests that all three outliers are units which have a balance towards astrophysics activity, which has relatively high citation rates. An even simpler picture of impact relative to journal average might therefore emerge if this were taken into account.

When impact is normalised relative to UoA, the overall picture is very similar, but now two of the grade 5 units are above the grade 5* range, and there is no longer an exceptionally high outlier at grade 4. This illustrates, once again, the fact that the position of the units when ranked on normalised citation performance will vary according to the level at which article performance is normalised.

The average normalised citation performance is higher (for all grades) when values are normalised against UoA average rather than Category average.

Overall, while the picture for these three UoAs is broadly the same, there are some important differences between the disciplines. The extent to which the most highly

Page 11: Calibrating the zoom — a test of Zitt’s hypothesis

J. ADAMS & AL.: Calibrating the zoom – a test of Zitt’s hypothesis

Scientometrics 75 (2008) 91

graded Physics is ahead of world average is very marked and even grade 4 Physics units exhibit an average performance mostly ahead of global field average. This contrast is a detail, however, compared to the underlying pattern common to all fields. Journal-level normalisation produces a shallow variation against RAE grade whereas field-level normalisation (by Category or by UoA) produces a steep and statistically highly significant variation with grade.

Each point is the average citation count to the end of 2005 for the set of journal articles submitted by a stated institution within this UoA, with impact for each article normalised against a world average.

The data are grouped according to the RAE grade awarded to the institution

Figure 3.Variation in citation impact for Unit of Assessment 19 Physics in the 2001 UK Research Assessment Exercise

Spearman rank correlation

When we carried out a Spearman rank correlation analysis, using the average relative impact for each unit as a data point, we found a statistically significant positive correlation for the field normalised data. This applies whether the data are normalised to journal Category average or to UoA average. Relative impact is statistically significantly correlated with the grade awarded by the per review panel. The results are summarised in Table 4.

Page 12: Calibrating the zoom — a test of Zitt’s hypothesis

J. ADAMS & AL.: Calibrating the zoom – a test of Zitt’s hypothesis

92 Scientometrics 75 (2008)

Table 4. Spearman rank correlation coefficients – RAE grade and normalised citation impact

RAE grade vs Citation impact

normalised at journal category level

RAE grade vs Citation impact

normalised at UoA level

Normalised impact: UoA vs Journal

category

UoA13 Psychology 46 0.70*** 0.72*** 0.92***

UoA14 Biological Sciences 58 0.27* 0.52*** 0.69***

UoA19 Physics 43 0.62*** 0.59*** 0.90***

Unit of Assessment (UoA) as used in UK Research

Assessment Exercise (RAE)

Number of units at

RAE2001

Spearman rank correlation coefficients

* significance level < 0.05 *** significance level < 0.001The normalised citation impact for each article was calculated, and these values were aggregated to find the average normalised citation impact for each unit (institution). Units were then ranked according to normalised citation impact and according to RAE grade. The Spearman rank correlation coefficient was calculated for i) RAE grade against impact normalised to Category average, ii) RAE grade against impact normalised to UoA average and iii) impact normalised to Category average against impact normalised to UoA average.

For UoA13 Psychology and UoA14 Biological Sciences, the correlation is greater at UoA-level than at Category-level. For UoA19 Physics, the reverse is true.

We also found, for each of the three subject areas, that the citation impact when citations were normalised at UoA-level was statistically significantly correlated with the citation impact when citations were normalised at Category-level. These results are in the right hand column of Table 4.

For Biological Sciences (UoA14), the subject where correlation was least significant, we plotted Rank C (where citations were normalised at Category level) against Rank U (where citations were normalised at UoA level). The results are shown in Figure 4. The Figure shows that most (70%) of the units are in the upper-right or lower-left quadrants, indicating that they remain in the upper (or lower) half of the distribution regardless of whether the impact is normalised to UoA or to Category. Points on the diagonal line represent units whose ranking is the same under each method of impact-normalisation, so the distance of each unit from the diagonal line shows how its rank changes according to the level of ‘zoom’.

The scatter plot in Figure 4 also reveals that more of the grade 4 units lie above the diagonal line than below, whereas the reverse is true for grade 5 and 5* units. This shows that the lower-graded units are ranked more highly under UoA-level normalisation than they are under Category-level normalisation; while higher-graded units appear to perform better when viewed with a closer level of zoom, for this subject area.

Page 13: Calibrating the zoom — a test of Zitt’s hypothesis

J. ADAMS & AL.: Calibrating the zoom – a test of Zitt’s hypothesis

Scientometrics 75 (2008) 93

Each point represents one of the 58 Biological Sciences units assessed for the 2001 UK RAE. Squares are the most highly-graded units (5*), stars are grade 5 units and triangles are grade 4s. The diagonal line represents units who rank equally under

each method of normalisation.

Figure 4. Rank of UoA14 Biological Sciences units when citation impact is normalised to Category average compared to rank when citation impact is normalised to UoA average

Discussion

We have analysed the relative impact of articles submitted for assessment in the UK RAE2001. We have compared impact at three different levels of normalisation for university departments at the three highest grades awarded in three Units of Assessment (UoA13 Psychology, UoA14 Biological Sciences and UoA19 Physics). There is a significant positive correlation between peer judgements and citation impact at some, but not all, levels of data aggregation (zoom).

When citation counts were normalised at the level of the journals in which they were published then there was little evident difference between performance at any grade, so no link could be made between peer review outcomes and a citation index. But when the normalisation was relative to the Category (the Thomson journal set) or the UoA (i.e. the journal sets mapping to the UoA) then on average the higher graded units had a statistically significant higher relative impact. These data test and support the analysis put forward by ZITT & AL. [2005].

Page 14: Calibrating the zoom — a test of Zitt’s hypothesis

J. ADAMS & AL.: Calibrating the zoom – a test of Zitt’s hypothesis

94 Scientometrics 75 (2008)

Why does relative impact change at journal, Category and UoA levels? The implication is that the material submitted by grade 4 units is actually sourced from journals of lower average impact than the material submitted by the grade 5 units. Thus, when the level of analysis is relative to journal these items appear to be of similar impact relative to the medium in which they are published. When the viewpoint is zoomed out to the broader, Category, level then the higher absolute citation count for the articles produced by the more highly graded units becomes apparent, and even more apparent at the UoA-level. This would suggest that higher-graded units are publishing in higher-impact categories.

Another possibility is that a finer-scale zoom would also separate clusters of applied but lower-cited research from frequently-cited fundamental research of topical interest (Michel Zitt, pers. comm.). Thus, the relative value of the applied work is lifted at the journal scale but swamped at the field level. This could account for the somewhat counter-intuitive result in Biological Sciences, where there is a lower density of applied research. We will test this hypothesis elsewhere.

The fact that the correlation between RAE grade and citation impact is strongest at UoA-level for two of our selected subjects but strongest at Category-level for the third may also suggest that the most appropriate level of ‘zoom’ varies between disciplines. Again, more work would be needed to test this.

The possibility that the level of ‘zoom’ will affect our assessment of relative impact is an important insight. Clearly, the fact that more than one view and hence more than one interpretation of performance might exist would need to be taken into account in any evaluation methodology. Ideally, the appropriate level of ‘zoom’ would need to be independently considered, explored and reported before confidence in the outcome of assessment could be validated. This is likely to be a serious challenge unless a reference indicator is available and will generally require any evaluation to be carried out at multiple levels for a reflective review.

*

We are grateful to two anonymous reviewers for their suggestions regarding this paper. We would also like to thank Michel Zitt for his helpful suggestions regarding the methodological development of this study and for his comments on an early draft. We are grateful to Henry Small and Nancy Bayers (Thomson Scientific, Philadelphia) for their advice. Citation data used in this study are derived from the UK National Citation Report 2005 and National Science Indicators 2005 prepared by Thomson Scientific Inc, Philadelphia, PA, USA: Copyright Thomson Scientific Inc 2005. All rights reserved. This paper summarises the outcomes of projects funded by the UK Research Councils and the Office of Science and Innovation and the work has benefited particularly from comments from John Kirk (OSI) and Ian Diamond and Iain Jones (ESRC).

Page 15: Calibrating the zoom — a test of Zitt’s hypothesis

J. ADAMS & AL.: Calibrating the zoom – a test of Zitt’s hypothesis

Scientometrics 75 (2008) 95

References

ADAMS, J. (1998), Benchmarking international research. Nature, 396 : 615–618. ADAMS, J., BAILEY, T., JACKSON, L., SCOTT, P., SMALL, H., PENDLEBURY, D. (1998), Benchmarking of the

International Standing of Research in England – a Consultancy Study for the Higher Education Funding Council for England. CPSE, University of Leeds. 108 pp. ISBN 1 901981 04 5.

GLÄNZEL, W., MOED, H. F. (2002), Journal impact measures in bibliometric research. Scientometrics,53 : 171–193.

HIRST, G. (1978), Discipline Impact factor: a method for determining core journal lists. Journal of the American Society for Information Science, 29 : 171–172.

KOSTOFF, R. (2002), Citation analysis of research performer quality. Scientometrics, 53 : 49–71. MOED, H. F., BURGER, W. J. M., FRANKFORT, J. G., VAN RAAN, A. F. J. (1985), A comparative study of

bibliometric part performance analysis and peer judgment. Scientometrics, 8 : 149–159. PODLUBNY, I. (2005), Comparison of scientific impact expressed by the number of citations in different fields

of science. Scientometrics, 64 : 95–99. SCHUBERT, A., BRAUN T. (1993), Reference standards for citation based assessments. Scientometrics,

26 : 21–35. SCHUBERT, A., BRAUN T. (1996), Cross-field normalization of scientometric indicators, Scientometrics,

36 : 311–324. VAN RAAN, A. J. (2006), Comparison of the Hirsch-index with standard bibliometric indicators and with peer

judgment. Scientometrics, 67 : 491–502. ZITT, M., RAMANANA-RAHARY, S., BASSECOULARD, E. (2005), Relativity of citation performance and

excellence measures: From cross-field to cross-scale effects of field-normalization. Scientometrics, 63 : 373–401.