Upload
l-egghe
View
212
Download
0
Embed Size (px)
Citation preview
Price Index and Its Relation to the Mean and MedianReference Age
L. EggheLimburgs Universitaire Centrum, Universitaire Campus, B-3590 Diepenbeek, Belgium.* E-mail: [email protected];and Universitaire Instelling Antwerpen, Universiteitsplein 1, B-2610 Wilrijk, Belgium
This article consists of two parts. In the first part, we In Glanzel and Schoepflin (1995), a graph of PI2
assume the simple decreasing exponential model for (1100) versus the mean reference age is produced. Withaging. In this case, we prove that the Price Index (the permission of the authors, and publisher, this graph isfraction of the references that are not older than a cer-
presented again (Fig. 1) .tain age) is a function of the mean reference age andThis is a remarkable graph. First of all, the overallalso a function of the median reference age. Both func-
tions are convexly decreasing, are 1 in 0 and tend to zero tendency is convexly decreasing, although the relation isfor the argument tending to infinity. not a pure function: Indeed, the middle part contains an
In the second part, the more realistic lognormal aging apparently thick cloud of points which seems to be struc-model is used. We now show that the Price Index is not
tural, and it smoothly disappears in the beginning and thea pure function of the mean or median reference age,end of the graph. Furthermore, it seems to be clear thatbut a well-defined relation in the form of a typical cloud
of points. This cloud (as, e.g., discussed in a 1995 article PI2 É 1 for the mean reference age going to zero (theof W. Glanzel & U. Schoepflin) is explained using results graph says 100%, being a fraction 1) and that PI2 goesfrom probability theory and statistics. New data (about to zero for the mean reference age going to infinity. Howreference ages in Journal of the American Society for
can this be explained?Information Science ) are produced that confirm the the-A good scientific method consists of starting withoretical findings.
known results on reference age distribution and then try-ing to explain the above graph. The simplest basic law
I. Introduction on reference age distribution is the exponential law c( t)Å cat with c( t) Å the distribution of the number ofPrice, in his classic article (1970), defines the so-called
‘‘Price Index’’ as ‘‘the proportion of the references that references that are t years old and where 0 õ a õ 1.Assuming this law gives a first explanation of the graphare to the last five years of literature.’’ It is hereby unim-in Figure 1, we prove that PId (any d √ N) is a convexlyportant whether or not we use the references of one article,decreasing function of the mean reference age (MA)the references of all the articles in a journal, or the refer-which is 1 in 0 and goes to 0 if the mean reference ageences of all the articles in all journals in a certain disci-goes to ` . The same conclusion can be drawn when thepline. Of course, the result may be different whether werelation of PId with the median reference age (the so-consider a discipline to be a set of articles or to be a setcalled half-life) is studied. This explains the shape of theof journals (for this, see a recent article by Egghe &graph but certainly not the thick cloud in the middle part.Rousseau, 1995), but in this article, we assume that we
In the next section, we use the more realistic form ofhave a fixed set of references.c( t) , namely a lognormal distribution. It has been provedGlanzel and Schoepflin (1995) deal with the Pricein Egghe and Rao (1992a) that c( t) indeed can be bestIndex as the proportion of references that are not oldermodeled via a lognormal distribution and there was alsothan 2 years. Therefore, we can define more generally thea theoretical explanation for this. The theory was con-Price Index (PId) as the fraction of references that are 0,firmed by Matricciani (1991) on the age distribution of1, 2, . . . , d years old.references in engineering articles.
Now it can be shown that PId is no longer a function* Permanent address. of MA, but a mathematical relation (a cloud). The shape
of this cloud is explained to be convexly decreasing (asReceived March 11, 1996; revised July 1, 1996; accepted July 15, 1996.is the case with the corresponding function, if we use theexponential model) .q 1997 John Wiley & Sons, Inc.
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE. 48(6) :564–573, 1997 CCC 0002-8231/97/060564-10
947/ 8N13$$0947 03-24-97 13:38:53 jasa W: JASIS
FIG. 1. Plot of ‘‘Price Index’’ versus mean reference age.
We have executed the same study for the median refer- in informetrics: It is the basic function for aging (0 õ aõ 1) and for growth (a ú 1), from which the moreence age (MD), again using the lognormal distribution
for c( t) . Again we found that PId is no longer a function realistic models are a deviation (lognormal distributionin the case of aging, an S-shaped like function such asof the median reference age, but that the cloud behaves
as in the MA case. the logistic curve or Gompertz distribution in the case ofgrowth—see Egghe & Rao, 1992a, 1992b). In additionThe ‘‘median’’ analogue of Figure 1 was not readily
available. Therefore, we collected some new data that are to this, this article will prove that from Equation 1, basicresults on other parameters (such as the Price Index) canpresented in the last section: We studied the reference
lists in the Journal of the American Society for Informa- be derived.For d Å 1, 2, 3, rrr we define the Price Indextion Science (JASIS) from 1986 until Volume 46 in 1995
(Issue 7): Each article then gives a mean and medianreference age, and several possible Price Indexes. We
PId Å# references that are 0, 1, 2, . . . , d years old
total # references.present PI2 , PI4 , and PI5 versus the mean and the median
reference age. These graphs confirms the theoretical find-ings of the previous section. In Price (1970), PI4 or PI5 is used (this is not clear from
Price’s text, although Wouters & Leydesdorff, 1994, usePI5 in reference to Price, 1970). In Glanzel and SchoepflinII. The Case that the Reference Age Distribution(1995), PI2 is used. The specific index d is not so im-Is Exponentialportant: Our results will be valid for every d √ N.
In this section, we suppose that we have a set of refer- We will study PId as a possible function of MA, theences with age distribution given by mean reference age, and of MD, the median reference
age. In both cases, we will use the following result.c( t) Å crat (1)
t Å 0, 1, 2, 3, rrr and 0 õ a õ 1. Since c( t) is a Theorem II.1distribution, one has
In case Equation 1 is valid, we have
∑`
tÅ0
cat Å 1. PId Å 1 0 ad/1 . (3)
Proof. By definitionHence,
PId Å (1 0 a)[1 / a /rrr/ ad]c Å a 0 1. (2)
This is a basic function, not only in mathematics but also (since (`tÅ0 cat Å 1)
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE—June 1997 565
947/ 8N13$$0947 03-24-97 13:38:53 jasa W: JASIS
FIG. 2. Decomposition of Figure 1 into trajectories s Å C .
and hence Equations 3 and 5 yield the following theorem.
PId Å 1 0 ad/1 . hTheorem II.2
We now start with the MA-dependency. PId Å 1 0 S MA1 / MAD
d/1
. (6)
II.1. PId as function of MA. By definition,
This function is convexly (∗) decreasing,MA Å ∑
`
tÅ0
t(1 0 a)at
limMAr
ú0
PId Å 1, limMAr`
PId Å 0Å (1 0 a)(a / 2a 2/ 3a 3 / rrr)
Å (1 0 a)(a / a 2 / a 3/ rrr / a 2 / a 3
(∗) Convexity is guaranteed if MA ú d /2 which is al-/ rrr / a 3 / rrr / rrr) . ways the case for reasonable d .
Å (1 0 a)[a(1 / a / a 2 / rrr)Proof. Formula 6 follows readily from Equations 3
and 5. Furthermore, taking derivatives with respect to,/ a 2(1 / a / a 2 / rrr)MA yields/ a 3(1 / a / a 2 / rrr) / rrr]
Å (1 0 a)(1 / a / a 2 / rrr)(a / a 2 / a 3 / rrr)PI*d Å 0
(d / 1)(MA) d
(1 / MA) d/2 õ 0
Å (1 0 a)a
(1 0 a)2
(always) and
hence
PI9d Å (d / 1)(MA) d01 2 MA 0 d
(1 / MA) d/3 ú 0
MA Å a
1 0 a. (4)
if MA ú d /2. We can assume that this is the case: In theGlanzel and Schoepflin case, MA is required to be larger
Consequently, than 1, in the Price case, MA must be larger than 2 or2.4, which will be true in almost all cases. If MA õ d/2,then we lose the convexity property. In any case, it isa Å MA
1 / MA. (5)
trivial that
566 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE—June 1997
947/ 8N13$$0947 03-24-97 13:38:53 jasa W: JASIS
FIG. 3. Plot of PI2 versus MA. The values of PI2 are multiplied by 100.
limMAr
ú0
PId Å 1 (7) C(MD) Å 12 C(0) .
limMAr`
PId Å 0. h (8) This gives
aMD Å 12Basically, this explains the shape of the Glanzel-
Shoepflin graph. The same results follow if we replaceMA with MD, the median reference age (also the half- hencelife) .
MD Å ln(0.5)ln a
. (9)II.2. PId as function of MD. Let us denote, by C( t) ,
the cumulative function.This formula also appears in Egghe and Rousseau (1990),p. 270. From this
C( t) Å ∑`
t =Åt
c( t *)a Å (0.5)1/MD (10)
C( t) Å ∑`
t =Åt
(1 0 a)at =Equations 3 and 10 yield the following theorem.
C( t) Å (1 0 a)at(1 / a / a 2 / rrr)Theorem II.3
C( t) Å at .PId Å 1 0 (0.5) (d/1) /MD. (11)
By definition, MD is this value of t for whichThis function is convexly (∗) decreasing,
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE—June 1997 567
947/ 8N13$$0947 03-24-97 13:38:53 jasa W: JASIS
FIG. 4. Plot of PI2 versus MD. The values of PI2 are multiplied by 100.
cases. If this condition is not satisfied, we only lose thelimMDr
ú0
PId Å 1, limMDr`
PId Å 0.convex shape of the curve. Further, it is clear that
(∗) Convexity is guaranteed if MD ú ((ln 2)/2)(d / 1) limMDr
ú0
PId Å 1 (12)which is always the case for reasonable d .
andProof. Formula 11 follows readily. Deriving w.r.t.MD gives
limMDr`
PId Å 0. h (13)
PI*d Å (0.5) (d/1) /MDln(0.5)d / 1
(MD)2 õ 0So, this section fully explained the general shape of
the functions MA r PId and MD r PId if the referencePI 9d Å 0ln(0.5)(0.5) (d/1) /MD d / 1
(MD)3 age distribution is given by Equation 1. The next sectiondeals with the lognormal distribution, the natural distribu-tion describing reference ages (Egghe & Rao, 1992a and1 S2 / d / 1
MDln(0.5)D ú 0 Matricciani, 1991).
ifIII. The General Case: The Case That theReference Age Distribution Is Lognormal
MD ú ln 22
(d / 1) É 0.3466(d / 1).
III.1. Introduction
It is well-known that the reference age follows a log-This is clearly satisfied in most cases: for d Å 2 is thecondition MD ú 1.04, for d Å 4 we have MD ú 1.73, normal distribution. This means that ln t is normally dis-
tributed. We hence have (cf. Goldberg, 1984, p. 404):and for d Å 5, MD ú 2.08, what is true in almost all
568 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE—June 1997
947/ 8N13$$0947 03-24-97 13:38:53 jasa W: JASIS
FIG. 5. Plot of PI4 versus MA. The values of PI4 are multiplied by 100.
where F denotes the cumulative standard normal distribu-c( t) Å 1
t√2ps
expF0 12 S ln t 0 m
s D2G (14) tion function.Next, we express MA and MD. In Goldberg (1984, p.
405) one finds:where m and s is the mean respectively the standarddeviation of ln t . MA Å em/s2/2 . (18)
Since PId is the fraction of references that are not olderthan d years, we have
For MD, we have, by definition
PId Å *d
0
1
t√2ps
expF0 12 S ln t 0 m
s D2Gdt . (15) P(0 ° t ° MD) Å 0.5.
Hence,We can simplify this expression as follows: Substitute
x Å ln t and after this Z Å (x 0 m)/s.P(0` ° ln t ° ln MD) Å 0.5We then have
andPId Å *
( ln d0m ) /s
0`
1√2p
e01/2Z2dZ . (16)
PS0` ° ln t 0 m
s° ln MD 0 m
s D Å 0.5.Consequently,
PId Å FS ln d 0 m
s D (17) Since (ln t 0 m) /s is standard normally distributed, wehence have
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE—June 1997 569
947/ 8N13$$0947 03-24-97 13:38:53 jasa W: JASIS
FIG. 6. Plot of PI4 versus MD. The values of PI4 are multiplied by 100.
ln MD 0 m
sÅ 0.
PId Å FS ln d 0 ln MDs D (21)
Consequently,
yielding the same conclusion as above.MD Å em . (19) So PId is neither a function of MA nor of MD. Still,
we want to explain the cloud of points in Figure 1 (w.r.t.It is trivial to see that MD õ MA. the variable MA) and the analogous graph with respect
Formulae 18 and 19 show that both functions (m, s2) r to the variable MD.MA and (m, s 2) r MD are not injective. This is clearer In the next section, we will explain the general shapewhen we express PId in terms of MA respectively MD. of the graph in Figure 1 (and the same for the vari-Formulae 15 and 18 imply able MD).
PIdÅ*d
0
1
t√2ps
expF0 12 S ln t0 ln MA/ s 2 /2
s D2Gdt III.2. Shape of Cloud of Points MA r PId
Since the relation MA r PId is not a function, wecannot study the MA-dependency in formula 20 just likePIdÅ FS ln d0 ln MA/ s 2 /2
s D . (20)that. What we can do is study the set of functions (thatconstitute the whole graph of MA r PId) as in Equation20, but with
Here s still can vary, yielding different values of PId forone value of MA. For MD, we have:
s Å constant. (22)
PId Å *d
0
1
t√2ps
expF0 12 S ln t 0 ln MD
s D2Gdt This gives trajectories in this cloud of points that can bestudied as follows.
570 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE—June 1997
947/ 8N13$$0947 03-24-97 13:38:53 jasa W: JASIS
FIG. 7. Plot of PI5 versus MA. The values of PI5 are multiplied by 100.
Using Formula 20 and using the fact thatMA ú d
es2/2 (24)
F *(x) Å 1√2p
e0x2 /2 (23)which is always satisfied if MA ¢ d . This is clearly truein practical cases and for the common values of d (e.g.,
we have, taking the derivative w.r.t. MA and using Equa- d ° 5). It is certainly so for d Å 2 in the Glanzel andtion 22 Schoepflin (1995) article.
In fact, one could also argue that, by the very nature ofthe Price Index (knowing the fraction of the ‘‘younger’’PI*dreferences) , it does not make much sense to study PId ford ú MA.Furthermore,Å 1√
2pexp 0
Sln d0 ln MA/ s 2 /2s D2
2 S0 1srMAD
limMAr
ú0
PId Å limmr`
PId Å F(/`) Å 1
õ0, alwayssince (by Equation 22) MA r 0, if and only if, m r 0` .Also, MA r /` , if and only if, m r /` . Now,
limMAr`
PId Å F(0`) Å 0.PI9dÅ1√2p
1s(MA)2 exp 0
S ln d0 ln MA/ s 2 /2s D2
2
So, on the trajectories s Å C in the cloud of points as inFigure 1, we found convexly decreasing functions which1 1 0 ln d 0 ln(MA) / s 2 /2
s 2 are 1 in 0 and tend to 0 for MA r /`: The same conclu-sions as in section I. This could be expected since thelognormal distribution is only a ‘‘perturbation for smallú0 if and only if
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE—June 1997 571
947/ 8N13$$0947 03-24-97 13:38:53 jasa W: JASIS
FIG. 8. Plot of PI5 versus MD. The values of PI5 are multiplied by 100.
t’’ of the exponential distribution. Yet, of course, theMD ú d
es2 . (25)results of section I are not immediately valid, withoutextra proof (as given here) .
Note that this condition is the same as in Equation 24.For this reason, again MA ¢ d suffices (and see theIII.3. Shape of Cloud of Points MD r PIdremarks made below Formula 24).
It is now easy to do the same for the variable MD. Again MD r 0, if and only if, m r 0` and MD rFormula 21 yields, taking the derivative w.r.t. MD (using /` , if and only if, m r /` , yieldingEquation 22):
limMDr0ú
PId Å F(/`) Å 1
PI *d Å1√2p
exp 0S ln d 0 ln MD
s D2
20 1
srMD and
limMDr`
PId Å F(0`) Å 0.õ0, always
So the graph of MD r PId has on its trajectories s Å C ,the same shape as the graph of MA r PId . These findings
PI 9d Å1
(MD)2
1s
1√2p
exp 0S ln d 0 ln MD
s D2
2 will be confirmed by the practical data of section IV.
Note. We hope that the results of section I and the
1 1 0 ln d 0 ln MDs 2
ones given above are enough explanation for the cloudof points as in Figure 1. Of course, theoretically, it canbe that the studied trajectories do not have the same shapeas the overall cloud, but Figure 1 rejects this possibility.ú0 if and only if
572 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE—June 1997
947/ 8N13$$0947 03-24-97 13:38:53 jasa W: JASIS
It is, hence, clear that the graph of Figure 1 is ‘‘com- graph versus the MA- and MD-values. We hence obtainedsix graphs—see Figures 3–8.posed’’ of different trajectories s Å C , where C varies.
This is depicted in Figure 2. The same can be said about All six graphs show a convexly decreasing cloud ofpoints. The time period used to calculate PI2 apparentlythe relation MD r PId .is too small to yield values above 0.8. These values (upto 1) are obtained in the PI4- and PI5-case.
IV. New Experimental Data on Mean and Median The thickness in the middle part of the graphs (withReference Age and on the Price Index MA as abscissa) is apparent. It is also clear that the MD-
graphs show more variability of PI-values for small MD-We have examined the reference lists of every articlevalues than is the case for small MA-values. For highin the journal JASIS in the period from 1986 until VolumeMD-values, this cannot be concluded due to the scarce-46, Issue 7, of 1995. We excluded letters to the editor,ness of data.book reviews, and also the review articles in the ‘‘Per-
These graphs, consequently, confirm our theoreticalspectives’’ issues. The reason is that we wanted to collectfindings. The graphs were obtained using the statisticalhomogeneous material, so that the age distribution of ev-package STATGRAPHICS 6.0.ery reference list can be viewed as a sample of a general
lognormal distribution. We took JASIS since most of theAcknowledgmentarticles in it have substantive reference lists, contrary to,
The author is indebted to Prof. Dr. R. Rousseau fore.g., Scientometrics where, regularly, articles with verystimulating discussions on this topic.short reference lists appear.
For each article we calculated:References
the total number of references; Egghe, L., & Rao, I. K. R. (1992a). Citation age data and the obsoles-the number of references to the years j , j 0 1, j 0 2, cence function: Fits and explanations. Information Processing and
Management, 28(2) , 201–217.j 0 3, j 0 4, and j 0 5, where j denotes the year ofEgghe, L., & Rao, I. K. R. (1992b). Classification of growth modelspublication of the article;
based on growth rates and its applications. Scientometrics, 25(1) , 5–the mean reference age; and46.the median reference age.
Egghe, L., & Rousseau, R. (1990). Introduction to informetrics. Quanti-tative methods in library, documentation and information science.Amsterdam: Elsevier.In total we studied 367 articles (we also left out two
Egghe, L., & Rousseau, R. (1996a). Averaging and globalizing quo-historical articles: They could cause scaling problems ontients of informetric and scientometric data. Journal of Informationthe MA 0 PId and MD 0 PId graphs; furthermore, weScience, 22(3) , 165–170.
wanted articles as homogeneous as possible, as explained Egghe, L., & Rousseau, R. (1996b). Average and global impact of aabove). set of journals. Scientometrics, 36(1) , 97–107.
Glanzel, W., & Schoepflin, U. (1995). A bibliometric ageing studyThe mean reference age (say m) was transformed intobased on serial and non-serial reference literature in the sciences.Proceedings of the Fifth Biennial Conference of the International
MA Å j 0 m Society for Scientometrics and Informetrics, June 7–10, 1995, RiverForest, IL, (pp. 177–185). Medford, NJ: Learned Information.
Goldberg, M. A. (1984). An introduction to probability theory withso that time has a fixed origin, and so that reference agesstatistical applications. New York: Plenum Press.
could be compared and put in the same graph. The same Matricciani, E. (1991). The probability distribution of the age of refer-was done for MD. ences in engineering papers. IEEE Transactions of Professional Com-
munication, 34, 7–12.By taking the number of references to the years j ,Price, D. De Solla (1970). Citation measures of hard science, softj 0 1, and j 0 2, adding these three numbers and dividing
science, technology and nonscience. In C. E. Nelson, D. K. Pollackthe result by the total number of references, we obtain,(Eds.) , Communication among scientists and engineers (pp. 3–22).
for each article, a PI2 value. Going 2 years further, we Lexington, MA: Heath.obtain PI4 and adding one more year ( j 0 5) yields PI5 Wouters, P., & Leydesdorff, L. (1994). Has Price’s dream come true:
Is scientometrics a hard science? Scientometrics, 31(2) , 193–222.in the same way. These PI-values were then put into a
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE—June 1997 573
947/ 8N13$$0947 03-24-97 13:38:53 jasa W: JASIS