23
ªSitationº distributions and Bradford’s law in a closed Web space Cristina Faba-Pe Ârez and V.P. Guerrero-Bote Facultad de Biblioteconomõ Âa y Documentacio Ân, Universidad de Extremadura, Alcazaba, Badajoz, Spain Fe Âlix Moya-Anego Ân Facultad de Biblioteconomõ Âa y Documentacio Ân, Universidad de Granada, Granada, Spain Keywords Worldwide web, Modelling, Case studies, Spain Abstract The study looks at how well the distribution of ªsitationsº (inlinks received by Web spaces) ®ts either a power law (of the Lotka type) or a bibliometric distribution for printed publications (of the Bradford type). The experimental sample examines the sitations found in a closed generic environment of thematically-related Web sites ± the case of Extremadura (Spain). Two sets of data, varying several parameters, were used. The sitation distributions found were coherent with those described in previous experiments of this type, including in the exponent. The plots of accumulated clusters of sitations and targets, however, did not ®t the typical Bradford distribution. Introduction Since ªinformetricsº studies the quantitative aspects of information processes in general ± incorporating, applying, and surpassing the frontiers of bibliometrics and ªscientometricsº (Tague-Sutcliffe, 1992) ± and the World Wide Web (henceforth the Web) on Internet has become today’s principal electronic information source (Bar-Ilan, 2001), it is not surprising that, beginning in the mid-1990s, informetric models and methods have been applied to the environment of the Internet and, more speci®cally, to the Web. This has led to the rise of two new disciplines that have been denominated ªcybermetricsº ( Cybermetrics: International Journal of Scientometrics, Informetrics and Bibliometrics, 2002) and ªWebometricsº (Almind and Ingwersen, 1997). At the present rate of growth of electronic resources ± in the summer of the year 2000 there were more than 2,000 million Web pages (Aguillo, 2000), and in January 2002 there were more than 140 million hosts on the Internet (Internet Software Consortium, 2002) ± the end users of this The Emerald Research Register for this journal is available at The current issue and full text archive of this journal is available at http://www.emeraldinsight.com/researchregister http://www.emeraldinsight.com/0022-0418.htm This work was ®nanced by the Junta de Extremadura ± Consejerõ Âa of Educacio Ân, Ciencia & Tecnologõ Âa ± and the European Social Fund, as part of the ªPrograma de Ayudas para la Realizacio Ân de Proyectos de Aplicacio Ân de las Tecnologõ Âas de la Informacio Ân y la Comunicacio Ânº (published in the DOE of Extremadura, No. 120, 17 October 2000). JD 59,5 558 Received 19 February 2003 Revised 2 July 2003 Accepted 8 July 2003 Journal of Documentation Vol. 59 No. 5, 2003 pp. 558-580 q MCB UP Limited 0022-0418 DOI 10.1108/00220410310499582

Sitation' distributions and Bradford's law in a closed ...hera.ugr.es/doi/14977710.pdf · ª Sitationº distributions and Bradford’s law in a closed ... The sitation distributions

Embed Size (px)

Citation preview

ordfSitationordm distributions andBradfordrsquos law in a closed

Web spaceCristina Faba-PeAcircrez and VP Guerrero-Bote

Facultad de BiblioteconomotildeAcirca y DocumentacioAcircnUniversidad de Extremadura Alcazaba Badajoz Spain

FeAcirclix Moya-AnegoAcircnFacultad de BiblioteconomotildeAcirca y DocumentacioAcircn Universidad de Granada

Granada Spain

Keywords Worldwide web Modelling Case studies Spain

Abstract The study looks at how well the distribution of ordfsitationsordm (inlinks received by Webspaces) regts either a power law (of the Lotka type) or a bibliometric distribution for printedpublications (of the Bradford type) The experimental sample examines the sitations found in aclosed generic environment of thematically-related Web sites plusmn the case of Extremadura (Spain)Two sets of data varying several parameters were used The sitation distributions found werecoherent with those described in previous experiments of this type including in the exponent Theplots of accumulated clusters of sitations and targets however did not regt the typical Bradforddistribution

IntroductionSince ordfinformetricsordm studies the quantitative aspects of information processesin general plusmn incorporating applying and surpassing the frontiers ofbibliometrics and ordfscientometricsordm (Tague-Sutcliffe 1992) plusmn and the WorldWide Web (henceforth the Web) on Internet has become todayrsquos principalelectronic information source (Bar-Ilan 2001) it is not surprising thatbeginning in the mid-1990s informetric models and methods have been appliedto the environment of the Internet and more speciregcally to the Web This hasled to the rise of two new disciplines that have been denominatedordfcybermetricsordm (Cybermetrics International Journal of ScientometricsInformetrics and Bibliometrics 2002) and ordfWebometricsordm (Almind andIngwersen 1997) At the present rate of growth of electronic resources plusmn in thesummer of the year 2000 there were more than 2000 million Web pages(Aguillo 2000) and in January 2002 there were more than 140 million hosts onthe Internet (Internet Software Consortium 2002) plusmn the end users of this

The Emerald Research Register for this journal is available at The current issue and full text archive of this journal is available at

httpwwwemeraldinsightcomresearchregister httpwwwemeraldinsightcom0022-0418htm

This work was regnanced by the Junta de Extremadura plusmn ConsejerotildeAcirca of EducacioAcircn Ciencia ampTecnologotildeAcirca plusmn and the European Social Fund as part of the ordfPrograma de Ayudas para laRealizacioAcircn de Proyectos de AplicacioAcircn de las TecnologotildeAcircas de la InformacioAcircn y la ComunicacioAcircnordm(published in the DOE of Extremadura No 120 17 October 2000)

JD595

558

Received 19 February2003Revised 2 July 2003Accepted 8 July 2003

Journal of DocumentationVol 59 No 5 2003pp 558-580

q MCB UP Limited0022-0418DOI 10110800220410310499582

information are regnding themselves overwhelmed by the immense quantity anddiversity of the content with its apparently chaotic organization its volatilitythe formal heterogeneity of the resources and the lack of any guarantee of theultimate quality of certain documents

To confront these problems it has become indispensable to develop andapply indicators of the quality of the information contained in ordfWeb spacesordm plusmnan expression introduced by Smith to refer to top-level domains (geographicalsuch asnz or sectorial such ascom) low level domains (such as vuwacnz)and groups of directories (such as httpwwwvuwacnzscim) (Smith 1999)Some of the proposals for such indicators opt for the qualitative analysis of theelectronic information available on the Internet (Smith 1996 1997 2001Correa-Uribe 1999 Codina 2000 Tillman 2000 Zhang and von Dran 2000JimeAcircnez-Piano 2001) Clearly however there is a major subjective componentin the analysis of qualitative characteristics especially with respect to thecontent

Other studies have proposed the use of citation analysis techniques similarto those applied to printed scientiregc publications to reveal the structure ofrelationships on the Web and to evaluate the quality of the content An exampleis Googlersquos PageRank algorithm (Brin and Page 1998) While another searchengine AltaVista has been described as the ordfcitation indexordm of the Web(RodrotildeAcircguez i GairotildeAcircn 1997) and a parallel has been seen between the regelds of theISI databases and the HTML tags used on the Web (Almind and Ingwersen1997) there really exists no analogue to the ISI and in particular the SCIdatabases with which to analyse Web ordfsitationsordm (inlinks received by Webspaces) It is nevertheless feasible to analyse Web spaces by studying theirlinks ie assuming that the outlinks and inlinks of Web spaces are equiparableto the references and citations-to (Price 1970) of traditional scientiregcpublications one can perform the process of ordfwebsitingordm (Rousseau 1997) plusmnfollowing the links received by a given Web space and analysing the sitationsThe use of this analogy has found opposing opinions in the literature withexamples in favour (McKiernan 1996 Larson 1996 Almind and Ingwersen1997 Vreeland 2000 BjoEgraverneborn and Ingwersen 2001 Cronin 2001) andagainst (Egghe 2000 Harter and Ford 2000 Kim 2000 van Raan 2001) Someworkers such as Rousseau (1997) take an intermediate standpoint consideringthat the study of websiting although conceptually equivalent to that oftraditional citations has to take a slightly different direction because there isnot the same motivation behind links and citations

In this regard it has been assumed that the distribution of citations inprinted publications satisreges in most cases Bradfordrsquos law of bibliometrics(Ferreiro-AlaAcircez 1981 Houston 1983 Gupta 1991 Mubeen 1996 Lal andPanda 1999 Reyes-BarragaAcircn et al 2000) This law which is reviewed in depthby Gorbea-Portal (1996) was originally devised for scientiregc journals(Bradford 1934)

ordfSitationordmdistributions

559

If scientiregc journals are arranged in order of decreasing productivity of articles on a givensubject they may be divided into a nucleus of periodicals more particularly devoted to thesubject and several groups or zones containing the same number of articles as the nucleuswhen the numbers of periodicals in the nucleus and succeeding zones will be as 1nn2

The paradigm of its (semi-logarithmic) plot consists of two curved sectionsseparated by an intermediate straight-line section ( )

There have been some cybermetric studies of how well Bradfordrsquos law regtsthe electronic environment of the Internet Bar-Ilan (1997) applies the law toidentify the core of newsgroups on ordfmad cow diseaseordm (BSE) comparing thegroups and their postings with scientiregc journals and their articlesrespectively None of the plots that were obtained for queries in the postingssatisfactorily regtted the traditional Bradford distribution Cui (1999) uses theBradford distribution to examine outlinks from the Web sites of the principalUS medical schools to determine the most cited core set In this case the resultsare coherent with Bradfordrsquos original formulation of the law

While the distributions found in traditional citation analysis are indeedfound to satisfy Bradfordrsquos law in the case of websiting ie of the inlinksreceived by Web spaces (ordfsitationsordm) it has to all practical purposes beendemonstrated that they satisfy a power law distribution Rousseau (1997) in afairly restricted study of sitations (343 URLs) regnds a power law distributionsimilar to Lotkarsquos law of bibliometrics (originally designed for scientiregcauthors) BarabaAcircsi and Albert (1999) and Kumar et al (2001) also regnd a powerlaw distribution Perhaps the deregnitive study is that of Broder et al (2000) whostudy 200 million pages and 1500 million links and practically establish thatthe probability of a document receiving i sitations is proportional to 1ix withx 1 The last three mentioned studies all regnd x = 21

While it may appear that the Bradford and the power law are mutuallyexclusive this is not necessarily so since for certain values of the exponentthey may practically coincide For values of the exponent around 2 in the powerlaw for example the verbal statement of Bradfordrsquos law is satisreged perfectlyand for exponents less than 2 it is satisreged in its graphic form although withvariations with respect to the original verbal statement

The present work is focused on the distribution of sitations in closed genericthematic environments such as the case of Extremadura (a Spanish region)considering only the sitations made by this environment We thereforediscarded internal mainly structural sitations Our goal was to determinewhether the resulting distributions regt the bibliometric functions for printedpublications (Bradford) or exponential functions similar to that of Lotka as inthe case of the Rousseau (1997) study It differs from previous work in that theframework is a closed thematic environment that is not strictly scientiregc and isheterogeneous in character It is also different from the aforementioned work ofRousseau (1997) regrstly because as just mentioned it does not deal exclusivelywith information on scientiregc topics and secondly since it focuses on sitations

JD595

560

made in this closed environment instead of on the sitations that the rest ofInternet makes to these Web spaces

The starting hypothesis of the work is that Bradfordrsquos law holds for smalland well-deregned areas of research but not for non-scientiregc environments thatare heterogeneous in character Principally to check this latter idea we carriedout a study of the websiting of the Web spaces of Extremadura

Material and methodsNaturally any study with these characteristics requires a set of data on whichto perform the corresponding calculations In this section we shall describe thetwo data sets that we used as well as some methodological aspects concerninghow they were generated

Unlike previous experiments the present study was not centred on scientiregcdocuments but on all types of documents in a portion of the Web Thecharacteristic that they had to share was simply their relationship with ourregion We were thereby looking for a source that would provide us withstarting Web spaces specialized in the topic of our investigation (origins orordfsitorsordm to continue with the cybermetric ordfsitationordm portmanteauism) From thelinks (destinations or ordfsiteesordm) in this source we would be able to retrieve ourWeb spaces on Extremadura With this aim and taking as the main evaluationcriterion the ordfauthorityordm of the source we selected ordfExtremadura in Internetordmthe Web server of the Junta of Extremadura (the supreme ofregcial organism inthe Autonomous Community) which compiles URLs of and about the region ofExtremadura (httpwwwjuntaexestodoweb)

The ordfExtremadura in Internetordm compilation consists of three categories ofURL Web sites Web pages including personal pages and sets of Web pageslodged on other servers We considered the three categories conjointly at leastfor their retrieval under the generic term of ordfWeb spacesordm fromaboutExtremadura

The regrst set of sitor Web spacesThis regrst group consisted of spaces extracted from the selected source Therewere 1850 different spaces in the source on which we performed the followingprocess of condensing thematic identiregcation and elimination of synonyms

Condensing In this phase we grouped together or unireged spaces whenthere existed some other which included the directory in which the regrstwas located (except in the case of personal pages) so that the two formedpart of the same structure of Web directories This reduced the 1850original URLs to 1047

Identiregcation of ExtremenAuml an Web spaces In this phase we checkedwhether a Web space despite belonging to a source that specialized inExtremadura was not actually about the region To this end we used acrawler written for LINUX in C and shell-script that scans the URLs of

ordfSitationordmdistributions

561

Web spaces and retrieves both the number of associated pages and thenumber of characters In order to identify ExtremenAumlan Web space URLswe added to the scanning routine 45 patterns representative ofExtremadura regltering out URLs that did not include any of them intheir content A close analysis of the results showed that we were alsoretrieving some URLs of Web spaces corresponding to nationalinstitutions that mention Extremadura on their Webs (ministriesuniversities publishing houses etc) but that are not speciregc to thetopic After eliminating these from the set the 1047 URLs were reducedto 755

Elimination of synonyms Synonyms refer to downloaded pages that wereidentical but whose URLs were different This could arise in one of twoways from automatic server redirection or from mirroring Theiridentiregcation was far from straightforward While redirections could beresolved using the URL of the regnal regle or the problem of different namesfor the same machine by using the IP address there still remained theproblem of mirrors of Web sites on different machines (with different IPs)or of Web sites that are lodged on more than one machine accessing oneor the other according to the demand We therefore decided to use literalcomparison of the content We used a mixed strategy for the regnalelimination of synonyms automatic comparison backed up by humancomparison After eliminating the synonyms we had a regnal set of URLsof 749 sitor Web spaces

The second set of sitor Web spacesIn parallel with the process of regltering the 1850 sitor Web spaces we decided tofollow the http-protocol-preregxed outlinks found in each sitor Web space as amethod to mine for potential URLs of Web spaces about Extremadura to add tothe foregoing regltered subset of sitor URLs and thus obtain our regnal set ofExtremadura-speciregc sitor Web spaces For this purpose we used a crawlerthat not only visits the initial URL of the Web space but also all those belongingto the same space and that identireges as a link URLs of the type ordfA Area andFrameordm which are external in nature This yielded an initial result of 466000links which we subjected to the following procedure

Filtering We regrst eliminated links that represented advertising bannersand links to the commonest default regles (index default menu etc) so asto facilitate the matching of similar URLs This reduced the set of citedWeb space URLs to 50500 We then tested the links for validity andeliminated duplicates The result was 38900 valid URLs

Condensing This process was similar to that carried out for the regrst set ofsitor Web spaces and reduced the target URLs to 19903

JD595

562

Identiregcation of ExtremenAuml an links The regrst part of this stage was alsosimilar to that carried out for the regrst subset of sitor Web spaces Thescanning routine reduced the total of potentially relevant links(fromabout Extremadura) to 6913 Then the detailed analysis of theselinks yielded a regnal set of 1232 real URLs about Extremadura (Theother 5681 corresponded to what we came to call ordfstop-wordsordm ieURLs of Web spaces of a general character that mention Extremaduraamong their numerous resources or that cause homonym problems inidentiregcation)

Elimination of synonyms This phase was again similar to that carried outfor the regrst set of sitor Web spaces and left 1214 URLs

With these data the overall set of Web spaces to use as origins for the sitationanalysis consisted of 1963 URLs (749 from the regrst ordfsitorordm group plus 1214from the second ordfsiteeordm group) After eliminating the duplicated URLs andmaking a last pass to eliminate errors (pages under construction missing regleschange of URL etc) and frames-generated synonyms we were regnally left witha database of 1180 citor Web space URLs

Retrieval of the sitationsUp to now we have described how the sets of sitor URLs were formed For eachURL however we needed to obtain the outlinks We used the crawler toretrieve the pages and then extracted the links of the type ordfA Area and FrameordmIn the case of the regrst set (of 749 URLs) of course this process had already beencarried out in obtaining the second set (giving as an intermediate result the1214 URLs of what we termed above the ordfsiteordm group)

It should be recalled at this point that the set of sitors are treated as Webspaces all of their pages are retrieved at once although the calculations couldsubsequently be made either with Web spaces or with individual pages sinceeach linkrsquos record contained both the origin URL and the origin Web space aswell as the target URL A page was included if the regnal regle that was retrievedwas in a subdirectory of the main address of the Web space

Result of the distributions and discussionAs was described in detail in the previous section we had two slightly differentsets of data which we shall call the regrst and the second sets The second wasmore thematically complete than the regrst

The design of these two databases allowed the data to be treated with thefollowing variations

(1) Errors This characteristic allowed us to form two subsets all the links that were present only those that corresponded to URLs that could be retrieved without

error

ordfSitationordmdistributions

563

(2) Type of link We again formed two outlink subsets Only those of the type ordfA Area or Frameordm These mainly denote some

thematic relationship Links of other types such as bgsound embediframe img input meta script etc were thereby eliminated

All the links independently of type

(3) Origin This indicator allowed three variations in the form of computingthe number of sitations to a given target Web spaces that were linked to a given target Web pages that were linked to a given target links that had a given target

(4) Destination There were two variations all the targets target Web spaces locating them by grouping together the type of

link URLs according to whether the path was included in some otherURL or not

(5) ExtremenAuml an target Again there were two variations in the form ofcomputing the targets all the targets only those that corresponded to Extremadura

(6) Targets with high sitation percentages from the pages of some Webspaces Another two variations in the form of computing the links all the links only those that did not have a high percentage of their sitations

originating from the pages of certain very large Web spaces (therewere some sitor Web spaces with a great number of pages a highproportion of which had a link to the same target thereby enormouslyincreasing the number of sitations of that target)

These variations generated 192 different distributions to use in thecalculations Since the power-law based distributions are traditionallyrepresented differently (number of pages vs the logarithm of the sitationsreceived) from those based on Bradfordrsquos law (accumulated sitations receivedvs the logarithm of accumulated targets) this would mean a total of 384representations We hence considered that to present exhaustively all the workthat was performed would surpass the limits of a communication such as thepresent Furthermore since it was possible to extract common characteristicsfrom many of these distributions we shall just present a small butrepresentative sample

We shall regrstly consider how well the data regt the power law by showing aseries of plots of the general behaviour with respect to a power law distribution

JD595

564

Since this type of distribution has been demonstrated to hold for Web pageswe shall begin with them Figure 1 shows the general regt to an exponent of 21We represent the number of pages that receive a given number of links using alogarithmic scale following Broder et al (2000) although instead ofrepresenting the raw number of pages we use the fraction of pages Theplotted data correspond to the regrst set both including and excluding those thathave unrecoverable errors (broken links) One observes the general goodness ofthe regt to the exponent 21 with there being a tail similar to that obtained byBroder et al (2000) (corresponding to groups of URLs that are cited veryfrequently) although we do not regnd those workersrsquo slight initial fall in the dataA close observation of Figure 1 shows that when there are few inlinks thepoints corresponding to all the links are for ratios slightly above those for thepoints corresponding to unbroken links only This tendency is reversed as thenumber of inlinks rises This was to be expected since it indicates that brokenlinks (the only difference between the pairs of points) are cited less often Thereare no other apparent differences between the data with and without errorsThe behaviour was similar for the second set

In Figure 2 we use the second set plotting all the links and the origin pagesseparately The difference is that in the second case if there are several linkswith the same target in the same page then they only count as one This will bethe case henceforth unless explicitly stated to the contrary One sees that theredo not appear to be any major differences between the two sets of points exceptfor the effect of small reductions in the number of inlinks recorded in somecases In particular the two variants are similar in shape The behaviour wassimilar for the regrst set

In Figure 3 we have grouped the origins by Web space so that several linkswithin the same space that have the same target are only counted once Oneresult is that the number of links recorded for each destination is far less than inFigures 1 and 2 The data are from the regrst set The tail is smaller and the slopeis greater corresponding to an exponent of 28 but the regt to a power lawdistribution is just as good The behaviour was similar for the second set

In Figure 4 we have grouped the targets just as in Figure 3 we grouped theorigins according to the path of the different URLs The data are from thesecond set The regt to the ideal distribution is similar to the previous cases Thedifference between the two variants in the plot is that the second excludes somecases in which all of the pages of a large Web space link to some other Webspace (which usually corresponds to the company that designed the regrst Webspace) In this second variant there is a logical decrease in the tail for the datafrom which we eliminated the targets that were very frequently linked to byvery large Web spaces The behaviour was similar for the regrst set

To conclude with the power law representations Figure 5 shows the Webspaces as in the previous case again eliminating the targets that are veryfrequently linked to in very large Web spaces but now with the restriction of

ordfSitationordmdistributions

565

Figure 1Fraction of pages of theregrst set that receive agiven number of inlinksboth including andexcluding erroneousURLs compared with anideal power lawdistribution

JD595

566

Figure 2Fraction of pages of thesecond set that receive agiven number of inlinkscounting on the one handall the links and on the

other only the originpages for each link (in

both cases after havingeliminated the

erroneous links)

ordfSitationordmdistributions

567

Figure 3Fraction of pages of theregrst set that receive agiven number of inlinkscounting only the originWeb spaces of each linkboth including andexcluding the erroneousURLs compared with theideal power lawdistribution

JD595

568

Figure 4Fraction of Web spaces

of the second set thatreceive a given number

of inlinks both includingand excluding the group

of targets that werelinked to in a large

percentage of the pagesof some very large Web

spaces

ordfSitationordmdistributions

569

Figure 5Fraction of Web spacesthat receive a givennumber of inlinkscomparing the regrst setwith the secondincluding only targetsdealing withExtremadura andexcluding targets thatwere very frequentlylinked to in very largeWeb spaces

JD595

570

being Web spaces which deal with Extremadura We compare the data fromthe regrst set with the data from the second One observes the logical decrease inthe number of elements but also a decrease in the slope that regts best anexponent of 15

As we mentioned above Bradfordrsquos law distributions are traditionallyrepresented differently from power law distributions so the presentation ofFigures 6-10 will be slightly different from the previous regve with accumulatedsitations plotted against sitors using a logarithmic scale for the latter sincetheir number increases exponentially with the accumulated sitationsBradfordrsquos law is of course normally applied to scientiregc journals countingthe citations received by each journal These citations point from the citingarticles (which in turn belong to journals) towards the journalrsquos articles Acertain ambiguity arises in the Web analogue however When a certain work iscited several times in an article it is only counted once Hence in our case toowhen a certain target is linked to several times it should also only be countedonce The problem is though that it is not clear that the Web spaces and pagesof sitation analysis should be taken as the analogues of journals and articlesrespectively or rather that an entire Web space should be taken as theanalogue of an article We shall look at both possibilities

Figure 6 shows the results for the second set when we count only onesitation per Web space for each target We present the results for target pagesindependently after grouping the pages by Web space and after alsoeliminating the frequent links as we did in the power law case This thereforeregards the analogues of the citor articles to be Web spaces the analogues ofthe cited journals to be pages or entire Web spaces (depending on the case) Thedata do not regt Bradfordrsquos law since the curve is concave from above Hence thenumber of sitations appears to grow exponentially and indeed if a logarithmicscale is also used for the vertical axis the result is practically a straight lineLikewise the verbal statement of the law is not satisreged since the proportion1nn2 does not hold

Figure 7 shows the same three distributions for the regrst set but nowcounting sitor pages This means that we are now considering Web pages to bethe analogues of citor articles which seems more natural One sees that the regrstdistribution corresponding to pages lies in the upper part of the plot at regrstrunning coincident with the Web space distribution and then rising above it

The second distribution grouped by Web space includes points ofinmacrection but in synthesis differs from a Bradford law curve in the followingaspects it begins at a point clearly above the origin the linear portion does notbegin at the regrst point of inmacrection but later and it ends with a stronglyincreasing section

The third distribution occupies the lower part of the plot This means thateliminating the links found in a large percentage of the pages of very large Webspaces eliminates many sitations but not many Web spaces or linked pages

ordfSitationordmdistributions

571

Figure 6Accumulated sitations vsaccumulated Webspaces The data arefrom the second set Theresults are presented fortarget pagesindependently groupedby Web space andexcluding targets thatwere very frequentlylinked to in very largeWeb spaces Only sitorWeb spaces are counted

JD595

572

Figure 7Accumulated sitations vs

accumulated Webspaces The data are

from the regrst set Theresults are presented for

target pagesindependently grouped

by Web space andexcluding targets thatwere very frequently

linked to in very largeWeb spaces Sitor pages

are counted

ordfSitationordmdistributions

573

Figure 8Accumulated sitations vsaccumulated Webspaces The data arefrom the regrst set Theresults are presented fortarget pagesindependently groupedby Web space andexcluding targets thatwere very frequentlylinked to in very largeWeb spaces Sitor pagesare counted but onlytargets aboutExtremadura

JD595

574

Figure 9Accumulated sitations vs

accumulated Webspaces The data are

from the second set Theresults are presented for

target pagesindependently grouped

by Web space andexcluding targets thatwere very frequently

linked to in very largeWeb spaces Sitor pages

are counted

ordfSitationordmdistributions

575

Figure 10Accumulated sitations vsaccumulated Webspaces The data arefrom the second set Theresults are presented fortarget pagesindependently groupedby Web space andexcluding targets thatwere very frequentlylinked to in very largeWeb spaces Sitor pagesare counted but onlytargets aboutExtremadura

JD595

576

Although as this distribution is plotted together with the other distributionsits slope is less clearly perceptible one nevertheless observes that the slopeincreases continuously implying an exponential growth of the accumulatedsitations also Indeed if a logarithmic scale is also used on the vertical axis theresult is a straight line with no point of inmacrection and not regtting Bradfordrsquoslaw It is interesting that with the elimination of this set of links the dependenceof the sitations becomes exponential Again the verbal statement of the law isnot satisreged since the proportion 1nn2 does not hold

Some authors Brookes (1969) for instance consider that Bradfordrsquos law ismainly applicable in small and well-deregned areas of research in which theremust exist a strong thematic relationship between the documents In Figure 8we have therefore represented only the pages or Web spaces that are fromExtremadura (for the regrst set of data) The behaviour however is quite similarto the previous case the regrst two distributions are fairly similar to each otherwith the greatest difference being in the third distribution which is a straightline

Figures 9 and 10 show the same calculations as Figures 7 and 8 but now fordata from the second set The results are identical

ConclusionsThe present experiment focused on the distribution of citations in a closed butheterogeneous environment of thematically related Web spaces using two setsof data We studied how they regt on the one hand a power law distribution andon the other Bradfordrsquos law looking at a great many variants with a total of 384plots

Studies in the literature have shown that on a very large scale sitationsfollow a power law distribution with an exponent of 21 The present sitationdistributions from both data sets were coherent with those earlier studiesincluding in the exponent independently of whether or not the counts includedbroken links total links or only sitor pages and pages or Web spaces both asorigin and as target We showed that the distributions differed howeveraccording to which calculation procedure was used and to what reglter wasapplied to the data In particular when we considered only Web spaces asorigins instead of pages or links and pages as destinations the slope increasedto correspond approximately to an exponent of 28 When however bothdestinations and origins were Web spaces the exponent returned to the value21 Also when the destinations were thematically restricted (in which casethey practically coincided with the citors) the slope was reduced to correspondapproximately to an exponent of 15 Finally the tail in the data was reduced byeliminating the targets that were very frequently linked to in very large Webspaces

We studied the regt to Bradfordrsquos law by plotting the accumulated sitationsagainst the accumulated targets Although we presented many distributions

ordfSitationordmdistributions

577

none of them regtted the typical Bradford case When we considered Web spacesas the sitors the resulting distribution was concave from above with no pointof inmacrection and passing through the origin The other distributions in whichpages were considered as the sitors could in general be characterized as beingcurves which start noticeably above the origin and have several points ofinmacrection ( ) It was also notable regrstly that taking Web spaces asdestinations instead of pages greatly reduced the accumulated sitations andmuch of the complexity of the curve and secondly that eliminating the targetsthat were very frequently linked to in very large Web spaces sharply reducedthe accumulated sitations and the curves changed now to pass through theorigin and presented an exponential dependence on the accumulated sitations

In sum everything seems to support the observation of Kim (2000) that thereare different motivations behind the citations in scientiregc articles and the linksof the World Wide Web In this sense we conregrmed the immense inmacruence ofthematic restriction (for instance nearly all the Web spaces included linksrelated to the World Wide Web itself and its technology) and of the repetitivelinks found in most of the very large Web spaces (which are usually generatedby means of software tools that allow links to be automatically included onevery page)

References

Aguillo IF (2000) ordfIndicadores hacia una evaluacioAcircn no objetiva (cuantitativa) de sedes webordm inVII Jornadas EspanAuml olas de DocumentacioAcircn (Bilbao 19-21 de octubre de 2000) ServicioEditorial de la Universidad del PaotildeAcircs Vasco Bilbao pp 233-48

Almind TC and Ingwersen P (1997) ordfInformetric analyses on the World Wide Webmethodological approaches to `Webometricsrsquoordm Journal of Documentation Vol 53 No 4pp 404-26

Bar-Ilan J (1997) ordfThe `mad cow diseasersquo Usenet newsgroups and bibliometric lawsordmScientometrics Vol 39 No 1 pp 29-55

Bar-Ilan J (2001) ordfData collection methods on the Web for informetrics purposes a review andanalysisordm Scientometrics Vol 50 No 1 pp 7-32

BarabaAcircsi AL and Albert R (1999) ordfEmergence of scaling in random networksordm ScienceVol 286 pp 509-12

BjoEgraverneborn L and Ingwersen P (2001) ordfPerspectives of webometricsordm Scientometrics Vol 50No 1 pp 65-82

Bradford SC (1934) ordfSourcesof information of speciregc subjectsordm Engineering Vol 137 pp 85-6

Brin S and Page L (1998) ordfThe anatomy of a large-scale hypertextual web search engineordmComputer Networks and ISDN Systems Vol 30 pp 107-17

Broder A et al (2000) ordfGraph structure in the Webordm Computer Networks and ISDN SystemsVol 33 No 1-6 pp 309-20

BrookesBC (1969) ordfBradfordrsquos law and the bibliography of scienceordm Nature Vol 224 pp 953-6

Codina L (2000) ordfParaAcircmetros e indicadores de calidad para la evaluacioAcircn de recursos digitalesordmin VII Jornadas EspanAuml olas de DocumentacioAcircn (Bilbao 19-21 de octubre de 2000) ServicioEditorial de la Universidad del PaotildeAcircs Vasco Bilbao pp 135-44

JD595

578

Correa-Uribe G (1999) Colombia Conectada al Mundo Sitios Web Colombianos Universidad deAntioquotildeAcirca AntioquotildeAcirca

Cronin B (2001) ordfBibliometrics and beyond some thoughts on web-based citation analysisordmJournal of Information Science Vol 27 No 1 pp 1-7

Cui L (1999) ordfRating health web sites using the principles of citation analysis a bibliometricapproachordm Journal of Medical Internet Research Vol 1 No e4 avalaible atwwwjmirorg19991e4indexhtm (accessed 18 August 2001)

Cybermetrics International Journal of Scientometrics Informetrics and Bibliometrics (2002)available at wwwcindoccsicescybermetrics (accessed 11 August 2002)

Egghe L (2000) ordfNew informetric aspects of the Internet some remacrections plusmn many problemsordmJournal of Information Science Vol 26 No 5 pp 329-35

Ferreiro-AlaAcircez L (1981) ordfAnaAcirclisis de referencias y caracterotildeAcircsticas bibliomeAcirctricas de losconjuntos de revistas nuclearesordm Revista EspanAuml ola de DocumentacioAcircn CientotildeAcircregca Vol 4No 3 pp 181-98

Gorbea-Portal S (1996) El Modelo MatemaAcirctico de Bradford Su AplicacioAcircn a las RevistasLatinoamericanas de las Ciencias BibliotecoloAcircgicas y de la InformacioAcircn UNAM CentroUniversitario de Investigaciones BibliotecoloAcircgicas MeAcircxico

Gupta DK (1991) ordfApplication of Bradfordrsquos law to citation data of Ethiopian medicaljournalsordm Annals of Library Science and Documentation Vol 38 No 3 pp 85-98

Harter SP and Ford CE (2000) ordfWeb-based analyses of e-journal impact approachesproblems and issuesordm Library Science With a Slant to Documentation and InformationStudies Vol 51 No 13 pp 1159-76

Houston W (1983) ordfThe application of bibliometrics to veterinary science primary literatureordmQuarterly Bulletin of International Association of Agricultural Information SpecialistsVol 28 No 1 pp 6-13

Internet Software Consortium (2002) ordfInternet domain surveyordm available at wwwiscorg(accessed 4 November 2002)

JimeAcircnez-Piano M (2001) ordfEvaluacioAcircn de sedes webordm Revista EspanAuml ola de DocumentacioAcircnCientotildeAcircregca Vol 24 No 4 pp 405-32

Kim HJ (2000) ordfMotivations for hyperlinking in scholarly electronic articles a qualitativestudyordm Journal of the American Society for Information Science Vol 51 No 10 pp 887-99

Kumar R et al (2001) ordfTrawling the Web for emerging cyber-communitiesordm in Mendelzon A etal (Eds) Proceedings of the 8th International World Wide Web Conference (TorontoCanadaAcirc May 11-14 1999) available at www8orgw8-papers4a-search-miningtrawlingtrawlinghtml (accessed 2 October 2002) Elsevier Amsterdam

Lal A and Panda KC (1999) ordfBradfordrsquos law and its application to bibliographical data ofplant pathology dissertations an analytical approachordm Library Science With a Slant toDocumentation and Information Studies Vol 36 No 3 pp 193-206

Larson RR (1996) ordfBibliometrics of the World Wide Web and exploratory analysis of theintellectual structure of cyberspaceordm in Hardin S (Ed) Proceedings of the 59th AnnualMeeting of the American Society for Information Science (Baltimore Maryland 1996)Information Today Medford NJ pp 71-8 available at httpsherlockberkeleyeduasis96asis96html (accessed 14 October 2000)

McKiernan G (1996) ordfCitedSites(sm) citation indexing of Web resourcesordm available atwwwpubliciastateedu CYBERSTACKSCitedhtm (accessed 24 February 2000)

Mubeen MA (1996) ordfCitation analysis of doctoral dissertations in chemistryordm Annals of LibraryScience and Documentation Vol 43 No 2 pp 48-58

ordfSitationordmdistributions

579

Price DJ (1970) ordfCitation measures of hard science soft science technology and non-scienceordm inNelson CC and Pollock DE (Eds) Communication Among Scientists and Engineers DCHealth and Co Lexington MA pp 3-22

Reyes-BarragaAcircn MJ et al (2000) ordfRevistas cientotildeAcircregcas determinacioAcircn de necesidades y usosordmRevista EspanAuml ola de DocumentacioAcircn CientotildeAcircregca Vol 23 No 4 pp 417-36

RodrotildeAcircguez i GairotildeAcircn JM (1997) ordfValoracioAcircn del impacto de la informacioAcircn en Internet Altavistael `Citation Indexrsquo de la redordm Revista EspanAuml ola de DocumentacioAcircn CientotildeAcircregca Vol 20 No 2pp 175-81

Rousseau R (1997) ordfCitations an exploratory studyordm Cybermetrics International Journal ofScientometrics Informetrics and Bibliometrics Vol 1 p 1 available atwwwcindoccsicescybermetricsarticlesv1i1p1html (accessed 5 September 2000)

Smith AG (1996) ordfCriteria for evaluation of Internet information resourcesordm available atwwwvuwacnz agsmithevalnindexhtm (accessed 25 March 2000)

Smith AG (1997) ordfTesting the surf criteria for evaluating Internet information resourcesordm ThePublic-Access Computer Systems Review Vol 8 No 3 available at httpinfolibuheduprv8n3smit8n3html (accessed 28 February 2002)

Smith AG (1999) ordfThe impact of web sites a comparison betwen Australasia and LatinAmericaordm available at wwwvuwacnz agsmithpublnsaustlat (accessed 12 May2001)

Smith AG (2001) ordfApplying evaluation criteria to New Zealand government websitesordmInternational Journal of Information Management Vol 21 No 2 pp 137-49

Tague-Sutcliffe J (1992) ordfAn introduction to informetricsordm Information Processing andManagement Vol 28 No 1 pp 1-3

Tillman HN (2000) ordfEvaluating quality on the Netordm available atwwwtiacnetusershoperegndqualhtml (accessed 5 June 2000)

van Raan AFJ (2001) ordfBibliometrics and Internet some observations and expectationsordmScientometrics Vol 50 No 1 pp 59-63

Vreeland RC (2000) ordfLaw libraries in hyperspacea citation analysis of World Wide Web sitesordmLaw Library Journal Vol 92 No 1 pp 9-25

Zhang P and von Dran GM (2000) ordfSatisregers and dissatisregers a two-factor model for websitedesign and evaluationordm Journal of the American Society for Information Science Vol 51No 14 pp 1253-68

JD595

580

information are regnding themselves overwhelmed by the immense quantity anddiversity of the content with its apparently chaotic organization its volatilitythe formal heterogeneity of the resources and the lack of any guarantee of theultimate quality of certain documents

To confront these problems it has become indispensable to develop andapply indicators of the quality of the information contained in ordfWeb spacesordm plusmnan expression introduced by Smith to refer to top-level domains (geographicalsuch asnz or sectorial such ascom) low level domains (such as vuwacnz)and groups of directories (such as httpwwwvuwacnzscim) (Smith 1999)Some of the proposals for such indicators opt for the qualitative analysis of theelectronic information available on the Internet (Smith 1996 1997 2001Correa-Uribe 1999 Codina 2000 Tillman 2000 Zhang and von Dran 2000JimeAcircnez-Piano 2001) Clearly however there is a major subjective componentin the analysis of qualitative characteristics especially with respect to thecontent

Other studies have proposed the use of citation analysis techniques similarto those applied to printed scientiregc publications to reveal the structure ofrelationships on the Web and to evaluate the quality of the content An exampleis Googlersquos PageRank algorithm (Brin and Page 1998) While another searchengine AltaVista has been described as the ordfcitation indexordm of the Web(RodrotildeAcircguez i GairotildeAcircn 1997) and a parallel has been seen between the regelds of theISI databases and the HTML tags used on the Web (Almind and Ingwersen1997) there really exists no analogue to the ISI and in particular the SCIdatabases with which to analyse Web ordfsitationsordm (inlinks received by Webspaces) It is nevertheless feasible to analyse Web spaces by studying theirlinks ie assuming that the outlinks and inlinks of Web spaces are equiparableto the references and citations-to (Price 1970) of traditional scientiregcpublications one can perform the process of ordfwebsitingordm (Rousseau 1997) plusmnfollowing the links received by a given Web space and analysing the sitationsThe use of this analogy has found opposing opinions in the literature withexamples in favour (McKiernan 1996 Larson 1996 Almind and Ingwersen1997 Vreeland 2000 BjoEgraverneborn and Ingwersen 2001 Cronin 2001) andagainst (Egghe 2000 Harter and Ford 2000 Kim 2000 van Raan 2001) Someworkers such as Rousseau (1997) take an intermediate standpoint consideringthat the study of websiting although conceptually equivalent to that oftraditional citations has to take a slightly different direction because there isnot the same motivation behind links and citations

In this regard it has been assumed that the distribution of citations inprinted publications satisreges in most cases Bradfordrsquos law of bibliometrics(Ferreiro-AlaAcircez 1981 Houston 1983 Gupta 1991 Mubeen 1996 Lal andPanda 1999 Reyes-BarragaAcircn et al 2000) This law which is reviewed in depthby Gorbea-Portal (1996) was originally devised for scientiregc journals(Bradford 1934)

ordfSitationordmdistributions

559

If scientiregc journals are arranged in order of decreasing productivity of articles on a givensubject they may be divided into a nucleus of periodicals more particularly devoted to thesubject and several groups or zones containing the same number of articles as the nucleuswhen the numbers of periodicals in the nucleus and succeeding zones will be as 1nn2

The paradigm of its (semi-logarithmic) plot consists of two curved sectionsseparated by an intermediate straight-line section ( )

There have been some cybermetric studies of how well Bradfordrsquos law regtsthe electronic environment of the Internet Bar-Ilan (1997) applies the law toidentify the core of newsgroups on ordfmad cow diseaseordm (BSE) comparing thegroups and their postings with scientiregc journals and their articlesrespectively None of the plots that were obtained for queries in the postingssatisfactorily regtted the traditional Bradford distribution Cui (1999) uses theBradford distribution to examine outlinks from the Web sites of the principalUS medical schools to determine the most cited core set In this case the resultsare coherent with Bradfordrsquos original formulation of the law

While the distributions found in traditional citation analysis are indeedfound to satisfy Bradfordrsquos law in the case of websiting ie of the inlinksreceived by Web spaces (ordfsitationsordm) it has to all practical purposes beendemonstrated that they satisfy a power law distribution Rousseau (1997) in afairly restricted study of sitations (343 URLs) regnds a power law distributionsimilar to Lotkarsquos law of bibliometrics (originally designed for scientiregcauthors) BarabaAcircsi and Albert (1999) and Kumar et al (2001) also regnd a powerlaw distribution Perhaps the deregnitive study is that of Broder et al (2000) whostudy 200 million pages and 1500 million links and practically establish thatthe probability of a document receiving i sitations is proportional to 1ix withx 1 The last three mentioned studies all regnd x = 21

While it may appear that the Bradford and the power law are mutuallyexclusive this is not necessarily so since for certain values of the exponentthey may practically coincide For values of the exponent around 2 in the powerlaw for example the verbal statement of Bradfordrsquos law is satisreged perfectlyand for exponents less than 2 it is satisreged in its graphic form although withvariations with respect to the original verbal statement

The present work is focused on the distribution of sitations in closed genericthematic environments such as the case of Extremadura (a Spanish region)considering only the sitations made by this environment We thereforediscarded internal mainly structural sitations Our goal was to determinewhether the resulting distributions regt the bibliometric functions for printedpublications (Bradford) or exponential functions similar to that of Lotka as inthe case of the Rousseau (1997) study It differs from previous work in that theframework is a closed thematic environment that is not strictly scientiregc and isheterogeneous in character It is also different from the aforementioned work ofRousseau (1997) regrstly because as just mentioned it does not deal exclusivelywith information on scientiregc topics and secondly since it focuses on sitations

JD595

560

made in this closed environment instead of on the sitations that the rest ofInternet makes to these Web spaces

The starting hypothesis of the work is that Bradfordrsquos law holds for smalland well-deregned areas of research but not for non-scientiregc environments thatare heterogeneous in character Principally to check this latter idea we carriedout a study of the websiting of the Web spaces of Extremadura

Material and methodsNaturally any study with these characteristics requires a set of data on whichto perform the corresponding calculations In this section we shall describe thetwo data sets that we used as well as some methodological aspects concerninghow they were generated

Unlike previous experiments the present study was not centred on scientiregcdocuments but on all types of documents in a portion of the Web Thecharacteristic that they had to share was simply their relationship with ourregion We were thereby looking for a source that would provide us withstarting Web spaces specialized in the topic of our investigation (origins orordfsitorsordm to continue with the cybermetric ordfsitationordm portmanteauism) From thelinks (destinations or ordfsiteesordm) in this source we would be able to retrieve ourWeb spaces on Extremadura With this aim and taking as the main evaluationcriterion the ordfauthorityordm of the source we selected ordfExtremadura in Internetordmthe Web server of the Junta of Extremadura (the supreme ofregcial organism inthe Autonomous Community) which compiles URLs of and about the region ofExtremadura (httpwwwjuntaexestodoweb)

The ordfExtremadura in Internetordm compilation consists of three categories ofURL Web sites Web pages including personal pages and sets of Web pageslodged on other servers We considered the three categories conjointly at leastfor their retrieval under the generic term of ordfWeb spacesordm fromaboutExtremadura

The regrst set of sitor Web spacesThis regrst group consisted of spaces extracted from the selected source Therewere 1850 different spaces in the source on which we performed the followingprocess of condensing thematic identiregcation and elimination of synonyms

Condensing In this phase we grouped together or unireged spaces whenthere existed some other which included the directory in which the regrstwas located (except in the case of personal pages) so that the two formedpart of the same structure of Web directories This reduced the 1850original URLs to 1047

Identiregcation of ExtremenAuml an Web spaces In this phase we checkedwhether a Web space despite belonging to a source that specialized inExtremadura was not actually about the region To this end we used acrawler written for LINUX in C and shell-script that scans the URLs of

ordfSitationordmdistributions

561

Web spaces and retrieves both the number of associated pages and thenumber of characters In order to identify ExtremenAumlan Web space URLswe added to the scanning routine 45 patterns representative ofExtremadura regltering out URLs that did not include any of them intheir content A close analysis of the results showed that we were alsoretrieving some URLs of Web spaces corresponding to nationalinstitutions that mention Extremadura on their Webs (ministriesuniversities publishing houses etc) but that are not speciregc to thetopic After eliminating these from the set the 1047 URLs were reducedto 755

Elimination of synonyms Synonyms refer to downloaded pages that wereidentical but whose URLs were different This could arise in one of twoways from automatic server redirection or from mirroring Theiridentiregcation was far from straightforward While redirections could beresolved using the URL of the regnal regle or the problem of different namesfor the same machine by using the IP address there still remained theproblem of mirrors of Web sites on different machines (with different IPs)or of Web sites that are lodged on more than one machine accessing oneor the other according to the demand We therefore decided to use literalcomparison of the content We used a mixed strategy for the regnalelimination of synonyms automatic comparison backed up by humancomparison After eliminating the synonyms we had a regnal set of URLsof 749 sitor Web spaces

The second set of sitor Web spacesIn parallel with the process of regltering the 1850 sitor Web spaces we decided tofollow the http-protocol-preregxed outlinks found in each sitor Web space as amethod to mine for potential URLs of Web spaces about Extremadura to add tothe foregoing regltered subset of sitor URLs and thus obtain our regnal set ofExtremadura-speciregc sitor Web spaces For this purpose we used a crawlerthat not only visits the initial URL of the Web space but also all those belongingto the same space and that identireges as a link URLs of the type ordfA Area andFrameordm which are external in nature This yielded an initial result of 466000links which we subjected to the following procedure

Filtering We regrst eliminated links that represented advertising bannersand links to the commonest default regles (index default menu etc) so asto facilitate the matching of similar URLs This reduced the set of citedWeb space URLs to 50500 We then tested the links for validity andeliminated duplicates The result was 38900 valid URLs

Condensing This process was similar to that carried out for the regrst set ofsitor Web spaces and reduced the target URLs to 19903

JD595

562

Identiregcation of ExtremenAuml an links The regrst part of this stage was alsosimilar to that carried out for the regrst subset of sitor Web spaces Thescanning routine reduced the total of potentially relevant links(fromabout Extremadura) to 6913 Then the detailed analysis of theselinks yielded a regnal set of 1232 real URLs about Extremadura (Theother 5681 corresponded to what we came to call ordfstop-wordsordm ieURLs of Web spaces of a general character that mention Extremaduraamong their numerous resources or that cause homonym problems inidentiregcation)

Elimination of synonyms This phase was again similar to that carried outfor the regrst set of sitor Web spaces and left 1214 URLs

With these data the overall set of Web spaces to use as origins for the sitationanalysis consisted of 1963 URLs (749 from the regrst ordfsitorordm group plus 1214from the second ordfsiteeordm group) After eliminating the duplicated URLs andmaking a last pass to eliminate errors (pages under construction missing regleschange of URL etc) and frames-generated synonyms we were regnally left witha database of 1180 citor Web space URLs

Retrieval of the sitationsUp to now we have described how the sets of sitor URLs were formed For eachURL however we needed to obtain the outlinks We used the crawler toretrieve the pages and then extracted the links of the type ordfA Area and FrameordmIn the case of the regrst set (of 749 URLs) of course this process had already beencarried out in obtaining the second set (giving as an intermediate result the1214 URLs of what we termed above the ordfsiteordm group)

It should be recalled at this point that the set of sitors are treated as Webspaces all of their pages are retrieved at once although the calculations couldsubsequently be made either with Web spaces or with individual pages sinceeach linkrsquos record contained both the origin URL and the origin Web space aswell as the target URL A page was included if the regnal regle that was retrievedwas in a subdirectory of the main address of the Web space

Result of the distributions and discussionAs was described in detail in the previous section we had two slightly differentsets of data which we shall call the regrst and the second sets The second wasmore thematically complete than the regrst

The design of these two databases allowed the data to be treated with thefollowing variations

(1) Errors This characteristic allowed us to form two subsets all the links that were present only those that corresponded to URLs that could be retrieved without

error

ordfSitationordmdistributions

563

(2) Type of link We again formed two outlink subsets Only those of the type ordfA Area or Frameordm These mainly denote some

thematic relationship Links of other types such as bgsound embediframe img input meta script etc were thereby eliminated

All the links independently of type

(3) Origin This indicator allowed three variations in the form of computingthe number of sitations to a given target Web spaces that were linked to a given target Web pages that were linked to a given target links that had a given target

(4) Destination There were two variations all the targets target Web spaces locating them by grouping together the type of

link URLs according to whether the path was included in some otherURL or not

(5) ExtremenAuml an target Again there were two variations in the form ofcomputing the targets all the targets only those that corresponded to Extremadura

(6) Targets with high sitation percentages from the pages of some Webspaces Another two variations in the form of computing the links all the links only those that did not have a high percentage of their sitations

originating from the pages of certain very large Web spaces (therewere some sitor Web spaces with a great number of pages a highproportion of which had a link to the same target thereby enormouslyincreasing the number of sitations of that target)

These variations generated 192 different distributions to use in thecalculations Since the power-law based distributions are traditionallyrepresented differently (number of pages vs the logarithm of the sitationsreceived) from those based on Bradfordrsquos law (accumulated sitations receivedvs the logarithm of accumulated targets) this would mean a total of 384representations We hence considered that to present exhaustively all the workthat was performed would surpass the limits of a communication such as thepresent Furthermore since it was possible to extract common characteristicsfrom many of these distributions we shall just present a small butrepresentative sample

We shall regrstly consider how well the data regt the power law by showing aseries of plots of the general behaviour with respect to a power law distribution

JD595

564

Since this type of distribution has been demonstrated to hold for Web pageswe shall begin with them Figure 1 shows the general regt to an exponent of 21We represent the number of pages that receive a given number of links using alogarithmic scale following Broder et al (2000) although instead ofrepresenting the raw number of pages we use the fraction of pages Theplotted data correspond to the regrst set both including and excluding those thathave unrecoverable errors (broken links) One observes the general goodness ofthe regt to the exponent 21 with there being a tail similar to that obtained byBroder et al (2000) (corresponding to groups of URLs that are cited veryfrequently) although we do not regnd those workersrsquo slight initial fall in the dataA close observation of Figure 1 shows that when there are few inlinks thepoints corresponding to all the links are for ratios slightly above those for thepoints corresponding to unbroken links only This tendency is reversed as thenumber of inlinks rises This was to be expected since it indicates that brokenlinks (the only difference between the pairs of points) are cited less often Thereare no other apparent differences between the data with and without errorsThe behaviour was similar for the second set

In Figure 2 we use the second set plotting all the links and the origin pagesseparately The difference is that in the second case if there are several linkswith the same target in the same page then they only count as one This will bethe case henceforth unless explicitly stated to the contrary One sees that theredo not appear to be any major differences between the two sets of points exceptfor the effect of small reductions in the number of inlinks recorded in somecases In particular the two variants are similar in shape The behaviour wassimilar for the regrst set

In Figure 3 we have grouped the origins by Web space so that several linkswithin the same space that have the same target are only counted once Oneresult is that the number of links recorded for each destination is far less than inFigures 1 and 2 The data are from the regrst set The tail is smaller and the slopeis greater corresponding to an exponent of 28 but the regt to a power lawdistribution is just as good The behaviour was similar for the second set

In Figure 4 we have grouped the targets just as in Figure 3 we grouped theorigins according to the path of the different URLs The data are from thesecond set The regt to the ideal distribution is similar to the previous cases Thedifference between the two variants in the plot is that the second excludes somecases in which all of the pages of a large Web space link to some other Webspace (which usually corresponds to the company that designed the regrst Webspace) In this second variant there is a logical decrease in the tail for the datafrom which we eliminated the targets that were very frequently linked to byvery large Web spaces The behaviour was similar for the regrst set

To conclude with the power law representations Figure 5 shows the Webspaces as in the previous case again eliminating the targets that are veryfrequently linked to in very large Web spaces but now with the restriction of

ordfSitationordmdistributions

565

Figure 1Fraction of pages of theregrst set that receive agiven number of inlinksboth including andexcluding erroneousURLs compared with anideal power lawdistribution

JD595

566

Figure 2Fraction of pages of thesecond set that receive agiven number of inlinkscounting on the one handall the links and on the

other only the originpages for each link (in

both cases after havingeliminated the

erroneous links)

ordfSitationordmdistributions

567

Figure 3Fraction of pages of theregrst set that receive agiven number of inlinkscounting only the originWeb spaces of each linkboth including andexcluding the erroneousURLs compared with theideal power lawdistribution

JD595

568

Figure 4Fraction of Web spaces

of the second set thatreceive a given number

of inlinks both includingand excluding the group

of targets that werelinked to in a large

percentage of the pagesof some very large Web

spaces

ordfSitationordmdistributions

569

Figure 5Fraction of Web spacesthat receive a givennumber of inlinkscomparing the regrst setwith the secondincluding only targetsdealing withExtremadura andexcluding targets thatwere very frequentlylinked to in very largeWeb spaces

JD595

570

being Web spaces which deal with Extremadura We compare the data fromthe regrst set with the data from the second One observes the logical decrease inthe number of elements but also a decrease in the slope that regts best anexponent of 15

As we mentioned above Bradfordrsquos law distributions are traditionallyrepresented differently from power law distributions so the presentation ofFigures 6-10 will be slightly different from the previous regve with accumulatedsitations plotted against sitors using a logarithmic scale for the latter sincetheir number increases exponentially with the accumulated sitationsBradfordrsquos law is of course normally applied to scientiregc journals countingthe citations received by each journal These citations point from the citingarticles (which in turn belong to journals) towards the journalrsquos articles Acertain ambiguity arises in the Web analogue however When a certain work iscited several times in an article it is only counted once Hence in our case toowhen a certain target is linked to several times it should also only be countedonce The problem is though that it is not clear that the Web spaces and pagesof sitation analysis should be taken as the analogues of journals and articlesrespectively or rather that an entire Web space should be taken as theanalogue of an article We shall look at both possibilities

Figure 6 shows the results for the second set when we count only onesitation per Web space for each target We present the results for target pagesindependently after grouping the pages by Web space and after alsoeliminating the frequent links as we did in the power law case This thereforeregards the analogues of the citor articles to be Web spaces the analogues ofthe cited journals to be pages or entire Web spaces (depending on the case) Thedata do not regt Bradfordrsquos law since the curve is concave from above Hence thenumber of sitations appears to grow exponentially and indeed if a logarithmicscale is also used for the vertical axis the result is practically a straight lineLikewise the verbal statement of the law is not satisreged since the proportion1nn2 does not hold

Figure 7 shows the same three distributions for the regrst set but nowcounting sitor pages This means that we are now considering Web pages to bethe analogues of citor articles which seems more natural One sees that the regrstdistribution corresponding to pages lies in the upper part of the plot at regrstrunning coincident with the Web space distribution and then rising above it

The second distribution grouped by Web space includes points ofinmacrection but in synthesis differs from a Bradford law curve in the followingaspects it begins at a point clearly above the origin the linear portion does notbegin at the regrst point of inmacrection but later and it ends with a stronglyincreasing section

The third distribution occupies the lower part of the plot This means thateliminating the links found in a large percentage of the pages of very large Webspaces eliminates many sitations but not many Web spaces or linked pages

ordfSitationordmdistributions

571

Figure 6Accumulated sitations vsaccumulated Webspaces The data arefrom the second set Theresults are presented fortarget pagesindependently groupedby Web space andexcluding targets thatwere very frequentlylinked to in very largeWeb spaces Only sitorWeb spaces are counted

JD595

572

Figure 7Accumulated sitations vs

accumulated Webspaces The data are

from the regrst set Theresults are presented for

target pagesindependently grouped

by Web space andexcluding targets thatwere very frequently

linked to in very largeWeb spaces Sitor pages

are counted

ordfSitationordmdistributions

573

Figure 8Accumulated sitations vsaccumulated Webspaces The data arefrom the regrst set Theresults are presented fortarget pagesindependently groupedby Web space andexcluding targets thatwere very frequentlylinked to in very largeWeb spaces Sitor pagesare counted but onlytargets aboutExtremadura

JD595

574

Figure 9Accumulated sitations vs

accumulated Webspaces The data are

from the second set Theresults are presented for

target pagesindependently grouped

by Web space andexcluding targets thatwere very frequently

linked to in very largeWeb spaces Sitor pages

are counted

ordfSitationordmdistributions

575

Figure 10Accumulated sitations vsaccumulated Webspaces The data arefrom the second set Theresults are presented fortarget pagesindependently groupedby Web space andexcluding targets thatwere very frequentlylinked to in very largeWeb spaces Sitor pagesare counted but onlytargets aboutExtremadura

JD595

576

Although as this distribution is plotted together with the other distributionsits slope is less clearly perceptible one nevertheless observes that the slopeincreases continuously implying an exponential growth of the accumulatedsitations also Indeed if a logarithmic scale is also used on the vertical axis theresult is a straight line with no point of inmacrection and not regtting Bradfordrsquoslaw It is interesting that with the elimination of this set of links the dependenceof the sitations becomes exponential Again the verbal statement of the law isnot satisreged since the proportion 1nn2 does not hold

Some authors Brookes (1969) for instance consider that Bradfordrsquos law ismainly applicable in small and well-deregned areas of research in which theremust exist a strong thematic relationship between the documents In Figure 8we have therefore represented only the pages or Web spaces that are fromExtremadura (for the regrst set of data) The behaviour however is quite similarto the previous case the regrst two distributions are fairly similar to each otherwith the greatest difference being in the third distribution which is a straightline

Figures 9 and 10 show the same calculations as Figures 7 and 8 but now fordata from the second set The results are identical

ConclusionsThe present experiment focused on the distribution of citations in a closed butheterogeneous environment of thematically related Web spaces using two setsof data We studied how they regt on the one hand a power law distribution andon the other Bradfordrsquos law looking at a great many variants with a total of 384plots

Studies in the literature have shown that on a very large scale sitationsfollow a power law distribution with an exponent of 21 The present sitationdistributions from both data sets were coherent with those earlier studiesincluding in the exponent independently of whether or not the counts includedbroken links total links or only sitor pages and pages or Web spaces both asorigin and as target We showed that the distributions differed howeveraccording to which calculation procedure was used and to what reglter wasapplied to the data In particular when we considered only Web spaces asorigins instead of pages or links and pages as destinations the slope increasedto correspond approximately to an exponent of 28 When however bothdestinations and origins were Web spaces the exponent returned to the value21 Also when the destinations were thematically restricted (in which casethey practically coincided with the citors) the slope was reduced to correspondapproximately to an exponent of 15 Finally the tail in the data was reduced byeliminating the targets that were very frequently linked to in very large Webspaces

We studied the regt to Bradfordrsquos law by plotting the accumulated sitationsagainst the accumulated targets Although we presented many distributions

ordfSitationordmdistributions

577

none of them regtted the typical Bradford case When we considered Web spacesas the sitors the resulting distribution was concave from above with no pointof inmacrection and passing through the origin The other distributions in whichpages were considered as the sitors could in general be characterized as beingcurves which start noticeably above the origin and have several points ofinmacrection ( ) It was also notable regrstly that taking Web spaces asdestinations instead of pages greatly reduced the accumulated sitations andmuch of the complexity of the curve and secondly that eliminating the targetsthat were very frequently linked to in very large Web spaces sharply reducedthe accumulated sitations and the curves changed now to pass through theorigin and presented an exponential dependence on the accumulated sitations

In sum everything seems to support the observation of Kim (2000) that thereare different motivations behind the citations in scientiregc articles and the linksof the World Wide Web In this sense we conregrmed the immense inmacruence ofthematic restriction (for instance nearly all the Web spaces included linksrelated to the World Wide Web itself and its technology) and of the repetitivelinks found in most of the very large Web spaces (which are usually generatedby means of software tools that allow links to be automatically included onevery page)

References

Aguillo IF (2000) ordfIndicadores hacia una evaluacioAcircn no objetiva (cuantitativa) de sedes webordm inVII Jornadas EspanAuml olas de DocumentacioAcircn (Bilbao 19-21 de octubre de 2000) ServicioEditorial de la Universidad del PaotildeAcircs Vasco Bilbao pp 233-48

Almind TC and Ingwersen P (1997) ordfInformetric analyses on the World Wide Webmethodological approaches to `Webometricsrsquoordm Journal of Documentation Vol 53 No 4pp 404-26

Bar-Ilan J (1997) ordfThe `mad cow diseasersquo Usenet newsgroups and bibliometric lawsordmScientometrics Vol 39 No 1 pp 29-55

Bar-Ilan J (2001) ordfData collection methods on the Web for informetrics purposes a review andanalysisordm Scientometrics Vol 50 No 1 pp 7-32

BarabaAcircsi AL and Albert R (1999) ordfEmergence of scaling in random networksordm ScienceVol 286 pp 509-12

BjoEgraverneborn L and Ingwersen P (2001) ordfPerspectives of webometricsordm Scientometrics Vol 50No 1 pp 65-82

Bradford SC (1934) ordfSourcesof information of speciregc subjectsordm Engineering Vol 137 pp 85-6

Brin S and Page L (1998) ordfThe anatomy of a large-scale hypertextual web search engineordmComputer Networks and ISDN Systems Vol 30 pp 107-17

Broder A et al (2000) ordfGraph structure in the Webordm Computer Networks and ISDN SystemsVol 33 No 1-6 pp 309-20

BrookesBC (1969) ordfBradfordrsquos law and the bibliography of scienceordm Nature Vol 224 pp 953-6

Codina L (2000) ordfParaAcircmetros e indicadores de calidad para la evaluacioAcircn de recursos digitalesordmin VII Jornadas EspanAuml olas de DocumentacioAcircn (Bilbao 19-21 de octubre de 2000) ServicioEditorial de la Universidad del PaotildeAcircs Vasco Bilbao pp 135-44

JD595

578

Correa-Uribe G (1999) Colombia Conectada al Mundo Sitios Web Colombianos Universidad deAntioquotildeAcirca AntioquotildeAcirca

Cronin B (2001) ordfBibliometrics and beyond some thoughts on web-based citation analysisordmJournal of Information Science Vol 27 No 1 pp 1-7

Cui L (1999) ordfRating health web sites using the principles of citation analysis a bibliometricapproachordm Journal of Medical Internet Research Vol 1 No e4 avalaible atwwwjmirorg19991e4indexhtm (accessed 18 August 2001)

Cybermetrics International Journal of Scientometrics Informetrics and Bibliometrics (2002)available at wwwcindoccsicescybermetrics (accessed 11 August 2002)

Egghe L (2000) ordfNew informetric aspects of the Internet some remacrections plusmn many problemsordmJournal of Information Science Vol 26 No 5 pp 329-35

Ferreiro-AlaAcircez L (1981) ordfAnaAcirclisis de referencias y caracterotildeAcircsticas bibliomeAcirctricas de losconjuntos de revistas nuclearesordm Revista EspanAuml ola de DocumentacioAcircn CientotildeAcircregca Vol 4No 3 pp 181-98

Gorbea-Portal S (1996) El Modelo MatemaAcirctico de Bradford Su AplicacioAcircn a las RevistasLatinoamericanas de las Ciencias BibliotecoloAcircgicas y de la InformacioAcircn UNAM CentroUniversitario de Investigaciones BibliotecoloAcircgicas MeAcircxico

Gupta DK (1991) ordfApplication of Bradfordrsquos law to citation data of Ethiopian medicaljournalsordm Annals of Library Science and Documentation Vol 38 No 3 pp 85-98

Harter SP and Ford CE (2000) ordfWeb-based analyses of e-journal impact approachesproblems and issuesordm Library Science With a Slant to Documentation and InformationStudies Vol 51 No 13 pp 1159-76

Houston W (1983) ordfThe application of bibliometrics to veterinary science primary literatureordmQuarterly Bulletin of International Association of Agricultural Information SpecialistsVol 28 No 1 pp 6-13

Internet Software Consortium (2002) ordfInternet domain surveyordm available at wwwiscorg(accessed 4 November 2002)

JimeAcircnez-Piano M (2001) ordfEvaluacioAcircn de sedes webordm Revista EspanAuml ola de DocumentacioAcircnCientotildeAcircregca Vol 24 No 4 pp 405-32

Kim HJ (2000) ordfMotivations for hyperlinking in scholarly electronic articles a qualitativestudyordm Journal of the American Society for Information Science Vol 51 No 10 pp 887-99

Kumar R et al (2001) ordfTrawling the Web for emerging cyber-communitiesordm in Mendelzon A etal (Eds) Proceedings of the 8th International World Wide Web Conference (TorontoCanadaAcirc May 11-14 1999) available at www8orgw8-papers4a-search-miningtrawlingtrawlinghtml (accessed 2 October 2002) Elsevier Amsterdam

Lal A and Panda KC (1999) ordfBradfordrsquos law and its application to bibliographical data ofplant pathology dissertations an analytical approachordm Library Science With a Slant toDocumentation and Information Studies Vol 36 No 3 pp 193-206

Larson RR (1996) ordfBibliometrics of the World Wide Web and exploratory analysis of theintellectual structure of cyberspaceordm in Hardin S (Ed) Proceedings of the 59th AnnualMeeting of the American Society for Information Science (Baltimore Maryland 1996)Information Today Medford NJ pp 71-8 available at httpsherlockberkeleyeduasis96asis96html (accessed 14 October 2000)

McKiernan G (1996) ordfCitedSites(sm) citation indexing of Web resourcesordm available atwwwpubliciastateedu CYBERSTACKSCitedhtm (accessed 24 February 2000)

Mubeen MA (1996) ordfCitation analysis of doctoral dissertations in chemistryordm Annals of LibraryScience and Documentation Vol 43 No 2 pp 48-58

ordfSitationordmdistributions

579

Price DJ (1970) ordfCitation measures of hard science soft science technology and non-scienceordm inNelson CC and Pollock DE (Eds) Communication Among Scientists and Engineers DCHealth and Co Lexington MA pp 3-22

Reyes-BarragaAcircn MJ et al (2000) ordfRevistas cientotildeAcircregcas determinacioAcircn de necesidades y usosordmRevista EspanAuml ola de DocumentacioAcircn CientotildeAcircregca Vol 23 No 4 pp 417-36

RodrotildeAcircguez i GairotildeAcircn JM (1997) ordfValoracioAcircn del impacto de la informacioAcircn en Internet Altavistael `Citation Indexrsquo de la redordm Revista EspanAuml ola de DocumentacioAcircn CientotildeAcircregca Vol 20 No 2pp 175-81

Rousseau R (1997) ordfCitations an exploratory studyordm Cybermetrics International Journal ofScientometrics Informetrics and Bibliometrics Vol 1 p 1 available atwwwcindoccsicescybermetricsarticlesv1i1p1html (accessed 5 September 2000)

Smith AG (1996) ordfCriteria for evaluation of Internet information resourcesordm available atwwwvuwacnz agsmithevalnindexhtm (accessed 25 March 2000)

Smith AG (1997) ordfTesting the surf criteria for evaluating Internet information resourcesordm ThePublic-Access Computer Systems Review Vol 8 No 3 available at httpinfolibuheduprv8n3smit8n3html (accessed 28 February 2002)

Smith AG (1999) ordfThe impact of web sites a comparison betwen Australasia and LatinAmericaordm available at wwwvuwacnz agsmithpublnsaustlat (accessed 12 May2001)

Smith AG (2001) ordfApplying evaluation criteria to New Zealand government websitesordmInternational Journal of Information Management Vol 21 No 2 pp 137-49

Tague-Sutcliffe J (1992) ordfAn introduction to informetricsordm Information Processing andManagement Vol 28 No 1 pp 1-3

Tillman HN (2000) ordfEvaluating quality on the Netordm available atwwwtiacnetusershoperegndqualhtml (accessed 5 June 2000)

van Raan AFJ (2001) ordfBibliometrics and Internet some observations and expectationsordmScientometrics Vol 50 No 1 pp 59-63

Vreeland RC (2000) ordfLaw libraries in hyperspacea citation analysis of World Wide Web sitesordmLaw Library Journal Vol 92 No 1 pp 9-25

Zhang P and von Dran GM (2000) ordfSatisregers and dissatisregers a two-factor model for websitedesign and evaluationordm Journal of the American Society for Information Science Vol 51No 14 pp 1253-68

JD595

580

If scientiregc journals are arranged in order of decreasing productivity of articles on a givensubject they may be divided into a nucleus of periodicals more particularly devoted to thesubject and several groups or zones containing the same number of articles as the nucleuswhen the numbers of periodicals in the nucleus and succeeding zones will be as 1nn2

The paradigm of its (semi-logarithmic) plot consists of two curved sectionsseparated by an intermediate straight-line section ( )

There have been some cybermetric studies of how well Bradfordrsquos law regtsthe electronic environment of the Internet Bar-Ilan (1997) applies the law toidentify the core of newsgroups on ordfmad cow diseaseordm (BSE) comparing thegroups and their postings with scientiregc journals and their articlesrespectively None of the plots that were obtained for queries in the postingssatisfactorily regtted the traditional Bradford distribution Cui (1999) uses theBradford distribution to examine outlinks from the Web sites of the principalUS medical schools to determine the most cited core set In this case the resultsare coherent with Bradfordrsquos original formulation of the law

While the distributions found in traditional citation analysis are indeedfound to satisfy Bradfordrsquos law in the case of websiting ie of the inlinksreceived by Web spaces (ordfsitationsordm) it has to all practical purposes beendemonstrated that they satisfy a power law distribution Rousseau (1997) in afairly restricted study of sitations (343 URLs) regnds a power law distributionsimilar to Lotkarsquos law of bibliometrics (originally designed for scientiregcauthors) BarabaAcircsi and Albert (1999) and Kumar et al (2001) also regnd a powerlaw distribution Perhaps the deregnitive study is that of Broder et al (2000) whostudy 200 million pages and 1500 million links and practically establish thatthe probability of a document receiving i sitations is proportional to 1ix withx 1 The last three mentioned studies all regnd x = 21

While it may appear that the Bradford and the power law are mutuallyexclusive this is not necessarily so since for certain values of the exponentthey may practically coincide For values of the exponent around 2 in the powerlaw for example the verbal statement of Bradfordrsquos law is satisreged perfectlyand for exponents less than 2 it is satisreged in its graphic form although withvariations with respect to the original verbal statement

The present work is focused on the distribution of sitations in closed genericthematic environments such as the case of Extremadura (a Spanish region)considering only the sitations made by this environment We thereforediscarded internal mainly structural sitations Our goal was to determinewhether the resulting distributions regt the bibliometric functions for printedpublications (Bradford) or exponential functions similar to that of Lotka as inthe case of the Rousseau (1997) study It differs from previous work in that theframework is a closed thematic environment that is not strictly scientiregc and isheterogeneous in character It is also different from the aforementioned work ofRousseau (1997) regrstly because as just mentioned it does not deal exclusivelywith information on scientiregc topics and secondly since it focuses on sitations

JD595

560

made in this closed environment instead of on the sitations that the rest ofInternet makes to these Web spaces

The starting hypothesis of the work is that Bradfordrsquos law holds for smalland well-deregned areas of research but not for non-scientiregc environments thatare heterogeneous in character Principally to check this latter idea we carriedout a study of the websiting of the Web spaces of Extremadura

Material and methodsNaturally any study with these characteristics requires a set of data on whichto perform the corresponding calculations In this section we shall describe thetwo data sets that we used as well as some methodological aspects concerninghow they were generated

Unlike previous experiments the present study was not centred on scientiregcdocuments but on all types of documents in a portion of the Web Thecharacteristic that they had to share was simply their relationship with ourregion We were thereby looking for a source that would provide us withstarting Web spaces specialized in the topic of our investigation (origins orordfsitorsordm to continue with the cybermetric ordfsitationordm portmanteauism) From thelinks (destinations or ordfsiteesordm) in this source we would be able to retrieve ourWeb spaces on Extremadura With this aim and taking as the main evaluationcriterion the ordfauthorityordm of the source we selected ordfExtremadura in Internetordmthe Web server of the Junta of Extremadura (the supreme ofregcial organism inthe Autonomous Community) which compiles URLs of and about the region ofExtremadura (httpwwwjuntaexestodoweb)

The ordfExtremadura in Internetordm compilation consists of three categories ofURL Web sites Web pages including personal pages and sets of Web pageslodged on other servers We considered the three categories conjointly at leastfor their retrieval under the generic term of ordfWeb spacesordm fromaboutExtremadura

The regrst set of sitor Web spacesThis regrst group consisted of spaces extracted from the selected source Therewere 1850 different spaces in the source on which we performed the followingprocess of condensing thematic identiregcation and elimination of synonyms

Condensing In this phase we grouped together or unireged spaces whenthere existed some other which included the directory in which the regrstwas located (except in the case of personal pages) so that the two formedpart of the same structure of Web directories This reduced the 1850original URLs to 1047

Identiregcation of ExtremenAuml an Web spaces In this phase we checkedwhether a Web space despite belonging to a source that specialized inExtremadura was not actually about the region To this end we used acrawler written for LINUX in C and shell-script that scans the URLs of

ordfSitationordmdistributions

561

Web spaces and retrieves both the number of associated pages and thenumber of characters In order to identify ExtremenAumlan Web space URLswe added to the scanning routine 45 patterns representative ofExtremadura regltering out URLs that did not include any of them intheir content A close analysis of the results showed that we were alsoretrieving some URLs of Web spaces corresponding to nationalinstitutions that mention Extremadura on their Webs (ministriesuniversities publishing houses etc) but that are not speciregc to thetopic After eliminating these from the set the 1047 URLs were reducedto 755

Elimination of synonyms Synonyms refer to downloaded pages that wereidentical but whose URLs were different This could arise in one of twoways from automatic server redirection or from mirroring Theiridentiregcation was far from straightforward While redirections could beresolved using the URL of the regnal regle or the problem of different namesfor the same machine by using the IP address there still remained theproblem of mirrors of Web sites on different machines (with different IPs)or of Web sites that are lodged on more than one machine accessing oneor the other according to the demand We therefore decided to use literalcomparison of the content We used a mixed strategy for the regnalelimination of synonyms automatic comparison backed up by humancomparison After eliminating the synonyms we had a regnal set of URLsof 749 sitor Web spaces

The second set of sitor Web spacesIn parallel with the process of regltering the 1850 sitor Web spaces we decided tofollow the http-protocol-preregxed outlinks found in each sitor Web space as amethod to mine for potential URLs of Web spaces about Extremadura to add tothe foregoing regltered subset of sitor URLs and thus obtain our regnal set ofExtremadura-speciregc sitor Web spaces For this purpose we used a crawlerthat not only visits the initial URL of the Web space but also all those belongingto the same space and that identireges as a link URLs of the type ordfA Area andFrameordm which are external in nature This yielded an initial result of 466000links which we subjected to the following procedure

Filtering We regrst eliminated links that represented advertising bannersand links to the commonest default regles (index default menu etc) so asto facilitate the matching of similar URLs This reduced the set of citedWeb space URLs to 50500 We then tested the links for validity andeliminated duplicates The result was 38900 valid URLs

Condensing This process was similar to that carried out for the regrst set ofsitor Web spaces and reduced the target URLs to 19903

JD595

562

Identiregcation of ExtremenAuml an links The regrst part of this stage was alsosimilar to that carried out for the regrst subset of sitor Web spaces Thescanning routine reduced the total of potentially relevant links(fromabout Extremadura) to 6913 Then the detailed analysis of theselinks yielded a regnal set of 1232 real URLs about Extremadura (Theother 5681 corresponded to what we came to call ordfstop-wordsordm ieURLs of Web spaces of a general character that mention Extremaduraamong their numerous resources or that cause homonym problems inidentiregcation)

Elimination of synonyms This phase was again similar to that carried outfor the regrst set of sitor Web spaces and left 1214 URLs

With these data the overall set of Web spaces to use as origins for the sitationanalysis consisted of 1963 URLs (749 from the regrst ordfsitorordm group plus 1214from the second ordfsiteeordm group) After eliminating the duplicated URLs andmaking a last pass to eliminate errors (pages under construction missing regleschange of URL etc) and frames-generated synonyms we were regnally left witha database of 1180 citor Web space URLs

Retrieval of the sitationsUp to now we have described how the sets of sitor URLs were formed For eachURL however we needed to obtain the outlinks We used the crawler toretrieve the pages and then extracted the links of the type ordfA Area and FrameordmIn the case of the regrst set (of 749 URLs) of course this process had already beencarried out in obtaining the second set (giving as an intermediate result the1214 URLs of what we termed above the ordfsiteordm group)

It should be recalled at this point that the set of sitors are treated as Webspaces all of their pages are retrieved at once although the calculations couldsubsequently be made either with Web spaces or with individual pages sinceeach linkrsquos record contained both the origin URL and the origin Web space aswell as the target URL A page was included if the regnal regle that was retrievedwas in a subdirectory of the main address of the Web space

Result of the distributions and discussionAs was described in detail in the previous section we had two slightly differentsets of data which we shall call the regrst and the second sets The second wasmore thematically complete than the regrst

The design of these two databases allowed the data to be treated with thefollowing variations

(1) Errors This characteristic allowed us to form two subsets all the links that were present only those that corresponded to URLs that could be retrieved without

error

ordfSitationordmdistributions

563

(2) Type of link We again formed two outlink subsets Only those of the type ordfA Area or Frameordm These mainly denote some

thematic relationship Links of other types such as bgsound embediframe img input meta script etc were thereby eliminated

All the links independently of type

(3) Origin This indicator allowed three variations in the form of computingthe number of sitations to a given target Web spaces that were linked to a given target Web pages that were linked to a given target links that had a given target

(4) Destination There were two variations all the targets target Web spaces locating them by grouping together the type of

link URLs according to whether the path was included in some otherURL or not

(5) ExtremenAuml an target Again there were two variations in the form ofcomputing the targets all the targets only those that corresponded to Extremadura

(6) Targets with high sitation percentages from the pages of some Webspaces Another two variations in the form of computing the links all the links only those that did not have a high percentage of their sitations

originating from the pages of certain very large Web spaces (therewere some sitor Web spaces with a great number of pages a highproportion of which had a link to the same target thereby enormouslyincreasing the number of sitations of that target)

These variations generated 192 different distributions to use in thecalculations Since the power-law based distributions are traditionallyrepresented differently (number of pages vs the logarithm of the sitationsreceived) from those based on Bradfordrsquos law (accumulated sitations receivedvs the logarithm of accumulated targets) this would mean a total of 384representations We hence considered that to present exhaustively all the workthat was performed would surpass the limits of a communication such as thepresent Furthermore since it was possible to extract common characteristicsfrom many of these distributions we shall just present a small butrepresentative sample

We shall regrstly consider how well the data regt the power law by showing aseries of plots of the general behaviour with respect to a power law distribution

JD595

564

Since this type of distribution has been demonstrated to hold for Web pageswe shall begin with them Figure 1 shows the general regt to an exponent of 21We represent the number of pages that receive a given number of links using alogarithmic scale following Broder et al (2000) although instead ofrepresenting the raw number of pages we use the fraction of pages Theplotted data correspond to the regrst set both including and excluding those thathave unrecoverable errors (broken links) One observes the general goodness ofthe regt to the exponent 21 with there being a tail similar to that obtained byBroder et al (2000) (corresponding to groups of URLs that are cited veryfrequently) although we do not regnd those workersrsquo slight initial fall in the dataA close observation of Figure 1 shows that when there are few inlinks thepoints corresponding to all the links are for ratios slightly above those for thepoints corresponding to unbroken links only This tendency is reversed as thenumber of inlinks rises This was to be expected since it indicates that brokenlinks (the only difference between the pairs of points) are cited less often Thereare no other apparent differences between the data with and without errorsThe behaviour was similar for the second set

In Figure 2 we use the second set plotting all the links and the origin pagesseparately The difference is that in the second case if there are several linkswith the same target in the same page then they only count as one This will bethe case henceforth unless explicitly stated to the contrary One sees that theredo not appear to be any major differences between the two sets of points exceptfor the effect of small reductions in the number of inlinks recorded in somecases In particular the two variants are similar in shape The behaviour wassimilar for the regrst set

In Figure 3 we have grouped the origins by Web space so that several linkswithin the same space that have the same target are only counted once Oneresult is that the number of links recorded for each destination is far less than inFigures 1 and 2 The data are from the regrst set The tail is smaller and the slopeis greater corresponding to an exponent of 28 but the regt to a power lawdistribution is just as good The behaviour was similar for the second set

In Figure 4 we have grouped the targets just as in Figure 3 we grouped theorigins according to the path of the different URLs The data are from thesecond set The regt to the ideal distribution is similar to the previous cases Thedifference between the two variants in the plot is that the second excludes somecases in which all of the pages of a large Web space link to some other Webspace (which usually corresponds to the company that designed the regrst Webspace) In this second variant there is a logical decrease in the tail for the datafrom which we eliminated the targets that were very frequently linked to byvery large Web spaces The behaviour was similar for the regrst set

To conclude with the power law representations Figure 5 shows the Webspaces as in the previous case again eliminating the targets that are veryfrequently linked to in very large Web spaces but now with the restriction of

ordfSitationordmdistributions

565

Figure 1Fraction of pages of theregrst set that receive agiven number of inlinksboth including andexcluding erroneousURLs compared with anideal power lawdistribution

JD595

566

Figure 2Fraction of pages of thesecond set that receive agiven number of inlinkscounting on the one handall the links and on the

other only the originpages for each link (in

both cases after havingeliminated the

erroneous links)

ordfSitationordmdistributions

567

Figure 3Fraction of pages of theregrst set that receive agiven number of inlinkscounting only the originWeb spaces of each linkboth including andexcluding the erroneousURLs compared with theideal power lawdistribution

JD595

568

Figure 4Fraction of Web spaces

of the second set thatreceive a given number

of inlinks both includingand excluding the group

of targets that werelinked to in a large

percentage of the pagesof some very large Web

spaces

ordfSitationordmdistributions

569

Figure 5Fraction of Web spacesthat receive a givennumber of inlinkscomparing the regrst setwith the secondincluding only targetsdealing withExtremadura andexcluding targets thatwere very frequentlylinked to in very largeWeb spaces

JD595

570

being Web spaces which deal with Extremadura We compare the data fromthe regrst set with the data from the second One observes the logical decrease inthe number of elements but also a decrease in the slope that regts best anexponent of 15

As we mentioned above Bradfordrsquos law distributions are traditionallyrepresented differently from power law distributions so the presentation ofFigures 6-10 will be slightly different from the previous regve with accumulatedsitations plotted against sitors using a logarithmic scale for the latter sincetheir number increases exponentially with the accumulated sitationsBradfordrsquos law is of course normally applied to scientiregc journals countingthe citations received by each journal These citations point from the citingarticles (which in turn belong to journals) towards the journalrsquos articles Acertain ambiguity arises in the Web analogue however When a certain work iscited several times in an article it is only counted once Hence in our case toowhen a certain target is linked to several times it should also only be countedonce The problem is though that it is not clear that the Web spaces and pagesof sitation analysis should be taken as the analogues of journals and articlesrespectively or rather that an entire Web space should be taken as theanalogue of an article We shall look at both possibilities

Figure 6 shows the results for the second set when we count only onesitation per Web space for each target We present the results for target pagesindependently after grouping the pages by Web space and after alsoeliminating the frequent links as we did in the power law case This thereforeregards the analogues of the citor articles to be Web spaces the analogues ofthe cited journals to be pages or entire Web spaces (depending on the case) Thedata do not regt Bradfordrsquos law since the curve is concave from above Hence thenumber of sitations appears to grow exponentially and indeed if a logarithmicscale is also used for the vertical axis the result is practically a straight lineLikewise the verbal statement of the law is not satisreged since the proportion1nn2 does not hold

Figure 7 shows the same three distributions for the regrst set but nowcounting sitor pages This means that we are now considering Web pages to bethe analogues of citor articles which seems more natural One sees that the regrstdistribution corresponding to pages lies in the upper part of the plot at regrstrunning coincident with the Web space distribution and then rising above it

The second distribution grouped by Web space includes points ofinmacrection but in synthesis differs from a Bradford law curve in the followingaspects it begins at a point clearly above the origin the linear portion does notbegin at the regrst point of inmacrection but later and it ends with a stronglyincreasing section

The third distribution occupies the lower part of the plot This means thateliminating the links found in a large percentage of the pages of very large Webspaces eliminates many sitations but not many Web spaces or linked pages

ordfSitationordmdistributions

571

Figure 6Accumulated sitations vsaccumulated Webspaces The data arefrom the second set Theresults are presented fortarget pagesindependently groupedby Web space andexcluding targets thatwere very frequentlylinked to in very largeWeb spaces Only sitorWeb spaces are counted

JD595

572

Figure 7Accumulated sitations vs

accumulated Webspaces The data are

from the regrst set Theresults are presented for

target pagesindependently grouped

by Web space andexcluding targets thatwere very frequently

linked to in very largeWeb spaces Sitor pages

are counted

ordfSitationordmdistributions

573

Figure 8Accumulated sitations vsaccumulated Webspaces The data arefrom the regrst set Theresults are presented fortarget pagesindependently groupedby Web space andexcluding targets thatwere very frequentlylinked to in very largeWeb spaces Sitor pagesare counted but onlytargets aboutExtremadura

JD595

574

Figure 9Accumulated sitations vs

accumulated Webspaces The data are

from the second set Theresults are presented for

target pagesindependently grouped

by Web space andexcluding targets thatwere very frequently

linked to in very largeWeb spaces Sitor pages

are counted

ordfSitationordmdistributions

575

Figure 10Accumulated sitations vsaccumulated Webspaces The data arefrom the second set Theresults are presented fortarget pagesindependently groupedby Web space andexcluding targets thatwere very frequentlylinked to in very largeWeb spaces Sitor pagesare counted but onlytargets aboutExtremadura

JD595

576

Although as this distribution is plotted together with the other distributionsits slope is less clearly perceptible one nevertheless observes that the slopeincreases continuously implying an exponential growth of the accumulatedsitations also Indeed if a logarithmic scale is also used on the vertical axis theresult is a straight line with no point of inmacrection and not regtting Bradfordrsquoslaw It is interesting that with the elimination of this set of links the dependenceof the sitations becomes exponential Again the verbal statement of the law isnot satisreged since the proportion 1nn2 does not hold

Some authors Brookes (1969) for instance consider that Bradfordrsquos law ismainly applicable in small and well-deregned areas of research in which theremust exist a strong thematic relationship between the documents In Figure 8we have therefore represented only the pages or Web spaces that are fromExtremadura (for the regrst set of data) The behaviour however is quite similarto the previous case the regrst two distributions are fairly similar to each otherwith the greatest difference being in the third distribution which is a straightline

Figures 9 and 10 show the same calculations as Figures 7 and 8 but now fordata from the second set The results are identical

ConclusionsThe present experiment focused on the distribution of citations in a closed butheterogeneous environment of thematically related Web spaces using two setsof data We studied how they regt on the one hand a power law distribution andon the other Bradfordrsquos law looking at a great many variants with a total of 384plots

Studies in the literature have shown that on a very large scale sitationsfollow a power law distribution with an exponent of 21 The present sitationdistributions from both data sets were coherent with those earlier studiesincluding in the exponent independently of whether or not the counts includedbroken links total links or only sitor pages and pages or Web spaces both asorigin and as target We showed that the distributions differed howeveraccording to which calculation procedure was used and to what reglter wasapplied to the data In particular when we considered only Web spaces asorigins instead of pages or links and pages as destinations the slope increasedto correspond approximately to an exponent of 28 When however bothdestinations and origins were Web spaces the exponent returned to the value21 Also when the destinations were thematically restricted (in which casethey practically coincided with the citors) the slope was reduced to correspondapproximately to an exponent of 15 Finally the tail in the data was reduced byeliminating the targets that were very frequently linked to in very large Webspaces

We studied the regt to Bradfordrsquos law by plotting the accumulated sitationsagainst the accumulated targets Although we presented many distributions

ordfSitationordmdistributions

577

none of them regtted the typical Bradford case When we considered Web spacesas the sitors the resulting distribution was concave from above with no pointof inmacrection and passing through the origin The other distributions in whichpages were considered as the sitors could in general be characterized as beingcurves which start noticeably above the origin and have several points ofinmacrection ( ) It was also notable regrstly that taking Web spaces asdestinations instead of pages greatly reduced the accumulated sitations andmuch of the complexity of the curve and secondly that eliminating the targetsthat were very frequently linked to in very large Web spaces sharply reducedthe accumulated sitations and the curves changed now to pass through theorigin and presented an exponential dependence on the accumulated sitations

In sum everything seems to support the observation of Kim (2000) that thereare different motivations behind the citations in scientiregc articles and the linksof the World Wide Web In this sense we conregrmed the immense inmacruence ofthematic restriction (for instance nearly all the Web spaces included linksrelated to the World Wide Web itself and its technology) and of the repetitivelinks found in most of the very large Web spaces (which are usually generatedby means of software tools that allow links to be automatically included onevery page)

References

Aguillo IF (2000) ordfIndicadores hacia una evaluacioAcircn no objetiva (cuantitativa) de sedes webordm inVII Jornadas EspanAuml olas de DocumentacioAcircn (Bilbao 19-21 de octubre de 2000) ServicioEditorial de la Universidad del PaotildeAcircs Vasco Bilbao pp 233-48

Almind TC and Ingwersen P (1997) ordfInformetric analyses on the World Wide Webmethodological approaches to `Webometricsrsquoordm Journal of Documentation Vol 53 No 4pp 404-26

Bar-Ilan J (1997) ordfThe `mad cow diseasersquo Usenet newsgroups and bibliometric lawsordmScientometrics Vol 39 No 1 pp 29-55

Bar-Ilan J (2001) ordfData collection methods on the Web for informetrics purposes a review andanalysisordm Scientometrics Vol 50 No 1 pp 7-32

BarabaAcircsi AL and Albert R (1999) ordfEmergence of scaling in random networksordm ScienceVol 286 pp 509-12

BjoEgraverneborn L and Ingwersen P (2001) ordfPerspectives of webometricsordm Scientometrics Vol 50No 1 pp 65-82

Bradford SC (1934) ordfSourcesof information of speciregc subjectsordm Engineering Vol 137 pp 85-6

Brin S and Page L (1998) ordfThe anatomy of a large-scale hypertextual web search engineordmComputer Networks and ISDN Systems Vol 30 pp 107-17

Broder A et al (2000) ordfGraph structure in the Webordm Computer Networks and ISDN SystemsVol 33 No 1-6 pp 309-20

BrookesBC (1969) ordfBradfordrsquos law and the bibliography of scienceordm Nature Vol 224 pp 953-6

Codina L (2000) ordfParaAcircmetros e indicadores de calidad para la evaluacioAcircn de recursos digitalesordmin VII Jornadas EspanAuml olas de DocumentacioAcircn (Bilbao 19-21 de octubre de 2000) ServicioEditorial de la Universidad del PaotildeAcircs Vasco Bilbao pp 135-44

JD595

578

Correa-Uribe G (1999) Colombia Conectada al Mundo Sitios Web Colombianos Universidad deAntioquotildeAcirca AntioquotildeAcirca

Cronin B (2001) ordfBibliometrics and beyond some thoughts on web-based citation analysisordmJournal of Information Science Vol 27 No 1 pp 1-7

Cui L (1999) ordfRating health web sites using the principles of citation analysis a bibliometricapproachordm Journal of Medical Internet Research Vol 1 No e4 avalaible atwwwjmirorg19991e4indexhtm (accessed 18 August 2001)

Cybermetrics International Journal of Scientometrics Informetrics and Bibliometrics (2002)available at wwwcindoccsicescybermetrics (accessed 11 August 2002)

Egghe L (2000) ordfNew informetric aspects of the Internet some remacrections plusmn many problemsordmJournal of Information Science Vol 26 No 5 pp 329-35

Ferreiro-AlaAcircez L (1981) ordfAnaAcirclisis de referencias y caracterotildeAcircsticas bibliomeAcirctricas de losconjuntos de revistas nuclearesordm Revista EspanAuml ola de DocumentacioAcircn CientotildeAcircregca Vol 4No 3 pp 181-98

Gorbea-Portal S (1996) El Modelo MatemaAcirctico de Bradford Su AplicacioAcircn a las RevistasLatinoamericanas de las Ciencias BibliotecoloAcircgicas y de la InformacioAcircn UNAM CentroUniversitario de Investigaciones BibliotecoloAcircgicas MeAcircxico

Gupta DK (1991) ordfApplication of Bradfordrsquos law to citation data of Ethiopian medicaljournalsordm Annals of Library Science and Documentation Vol 38 No 3 pp 85-98

Harter SP and Ford CE (2000) ordfWeb-based analyses of e-journal impact approachesproblems and issuesordm Library Science With a Slant to Documentation and InformationStudies Vol 51 No 13 pp 1159-76

Houston W (1983) ordfThe application of bibliometrics to veterinary science primary literatureordmQuarterly Bulletin of International Association of Agricultural Information SpecialistsVol 28 No 1 pp 6-13

Internet Software Consortium (2002) ordfInternet domain surveyordm available at wwwiscorg(accessed 4 November 2002)

JimeAcircnez-Piano M (2001) ordfEvaluacioAcircn de sedes webordm Revista EspanAuml ola de DocumentacioAcircnCientotildeAcircregca Vol 24 No 4 pp 405-32

Kim HJ (2000) ordfMotivations for hyperlinking in scholarly electronic articles a qualitativestudyordm Journal of the American Society for Information Science Vol 51 No 10 pp 887-99

Kumar R et al (2001) ordfTrawling the Web for emerging cyber-communitiesordm in Mendelzon A etal (Eds) Proceedings of the 8th International World Wide Web Conference (TorontoCanadaAcirc May 11-14 1999) available at www8orgw8-papers4a-search-miningtrawlingtrawlinghtml (accessed 2 October 2002) Elsevier Amsterdam

Lal A and Panda KC (1999) ordfBradfordrsquos law and its application to bibliographical data ofplant pathology dissertations an analytical approachordm Library Science With a Slant toDocumentation and Information Studies Vol 36 No 3 pp 193-206

Larson RR (1996) ordfBibliometrics of the World Wide Web and exploratory analysis of theintellectual structure of cyberspaceordm in Hardin S (Ed) Proceedings of the 59th AnnualMeeting of the American Society for Information Science (Baltimore Maryland 1996)Information Today Medford NJ pp 71-8 available at httpsherlockberkeleyeduasis96asis96html (accessed 14 October 2000)

McKiernan G (1996) ordfCitedSites(sm) citation indexing of Web resourcesordm available atwwwpubliciastateedu CYBERSTACKSCitedhtm (accessed 24 February 2000)

Mubeen MA (1996) ordfCitation analysis of doctoral dissertations in chemistryordm Annals of LibraryScience and Documentation Vol 43 No 2 pp 48-58

ordfSitationordmdistributions

579

Price DJ (1970) ordfCitation measures of hard science soft science technology and non-scienceordm inNelson CC and Pollock DE (Eds) Communication Among Scientists and Engineers DCHealth and Co Lexington MA pp 3-22

Reyes-BarragaAcircn MJ et al (2000) ordfRevistas cientotildeAcircregcas determinacioAcircn de necesidades y usosordmRevista EspanAuml ola de DocumentacioAcircn CientotildeAcircregca Vol 23 No 4 pp 417-36

RodrotildeAcircguez i GairotildeAcircn JM (1997) ordfValoracioAcircn del impacto de la informacioAcircn en Internet Altavistael `Citation Indexrsquo de la redordm Revista EspanAuml ola de DocumentacioAcircn CientotildeAcircregca Vol 20 No 2pp 175-81

Rousseau R (1997) ordfCitations an exploratory studyordm Cybermetrics International Journal ofScientometrics Informetrics and Bibliometrics Vol 1 p 1 available atwwwcindoccsicescybermetricsarticlesv1i1p1html (accessed 5 September 2000)

Smith AG (1996) ordfCriteria for evaluation of Internet information resourcesordm available atwwwvuwacnz agsmithevalnindexhtm (accessed 25 March 2000)

Smith AG (1997) ordfTesting the surf criteria for evaluating Internet information resourcesordm ThePublic-Access Computer Systems Review Vol 8 No 3 available at httpinfolibuheduprv8n3smit8n3html (accessed 28 February 2002)

Smith AG (1999) ordfThe impact of web sites a comparison betwen Australasia and LatinAmericaordm available at wwwvuwacnz agsmithpublnsaustlat (accessed 12 May2001)

Smith AG (2001) ordfApplying evaluation criteria to New Zealand government websitesordmInternational Journal of Information Management Vol 21 No 2 pp 137-49

Tague-Sutcliffe J (1992) ordfAn introduction to informetricsordm Information Processing andManagement Vol 28 No 1 pp 1-3

Tillman HN (2000) ordfEvaluating quality on the Netordm available atwwwtiacnetusershoperegndqualhtml (accessed 5 June 2000)

van Raan AFJ (2001) ordfBibliometrics and Internet some observations and expectationsordmScientometrics Vol 50 No 1 pp 59-63

Vreeland RC (2000) ordfLaw libraries in hyperspacea citation analysis of World Wide Web sitesordmLaw Library Journal Vol 92 No 1 pp 9-25

Zhang P and von Dran GM (2000) ordfSatisregers and dissatisregers a two-factor model for websitedesign and evaluationordm Journal of the American Society for Information Science Vol 51No 14 pp 1253-68

JD595

580

made in this closed environment instead of on the sitations that the rest ofInternet makes to these Web spaces

The starting hypothesis of the work is that Bradfordrsquos law holds for smalland well-deregned areas of research but not for non-scientiregc environments thatare heterogeneous in character Principally to check this latter idea we carriedout a study of the websiting of the Web spaces of Extremadura

Material and methodsNaturally any study with these characteristics requires a set of data on whichto perform the corresponding calculations In this section we shall describe thetwo data sets that we used as well as some methodological aspects concerninghow they were generated

Unlike previous experiments the present study was not centred on scientiregcdocuments but on all types of documents in a portion of the Web Thecharacteristic that they had to share was simply their relationship with ourregion We were thereby looking for a source that would provide us withstarting Web spaces specialized in the topic of our investigation (origins orordfsitorsordm to continue with the cybermetric ordfsitationordm portmanteauism) From thelinks (destinations or ordfsiteesordm) in this source we would be able to retrieve ourWeb spaces on Extremadura With this aim and taking as the main evaluationcriterion the ordfauthorityordm of the source we selected ordfExtremadura in Internetordmthe Web server of the Junta of Extremadura (the supreme ofregcial organism inthe Autonomous Community) which compiles URLs of and about the region ofExtremadura (httpwwwjuntaexestodoweb)

The ordfExtremadura in Internetordm compilation consists of three categories ofURL Web sites Web pages including personal pages and sets of Web pageslodged on other servers We considered the three categories conjointly at leastfor their retrieval under the generic term of ordfWeb spacesordm fromaboutExtremadura

The regrst set of sitor Web spacesThis regrst group consisted of spaces extracted from the selected source Therewere 1850 different spaces in the source on which we performed the followingprocess of condensing thematic identiregcation and elimination of synonyms

Condensing In this phase we grouped together or unireged spaces whenthere existed some other which included the directory in which the regrstwas located (except in the case of personal pages) so that the two formedpart of the same structure of Web directories This reduced the 1850original URLs to 1047

Identiregcation of ExtremenAuml an Web spaces In this phase we checkedwhether a Web space despite belonging to a source that specialized inExtremadura was not actually about the region To this end we used acrawler written for LINUX in C and shell-script that scans the URLs of

ordfSitationordmdistributions

561

Web spaces and retrieves both the number of associated pages and thenumber of characters In order to identify ExtremenAumlan Web space URLswe added to the scanning routine 45 patterns representative ofExtremadura regltering out URLs that did not include any of them intheir content A close analysis of the results showed that we were alsoretrieving some URLs of Web spaces corresponding to nationalinstitutions that mention Extremadura on their Webs (ministriesuniversities publishing houses etc) but that are not speciregc to thetopic After eliminating these from the set the 1047 URLs were reducedto 755

Elimination of synonyms Synonyms refer to downloaded pages that wereidentical but whose URLs were different This could arise in one of twoways from automatic server redirection or from mirroring Theiridentiregcation was far from straightforward While redirections could beresolved using the URL of the regnal regle or the problem of different namesfor the same machine by using the IP address there still remained theproblem of mirrors of Web sites on different machines (with different IPs)or of Web sites that are lodged on more than one machine accessing oneor the other according to the demand We therefore decided to use literalcomparison of the content We used a mixed strategy for the regnalelimination of synonyms automatic comparison backed up by humancomparison After eliminating the synonyms we had a regnal set of URLsof 749 sitor Web spaces

The second set of sitor Web spacesIn parallel with the process of regltering the 1850 sitor Web spaces we decided tofollow the http-protocol-preregxed outlinks found in each sitor Web space as amethod to mine for potential URLs of Web spaces about Extremadura to add tothe foregoing regltered subset of sitor URLs and thus obtain our regnal set ofExtremadura-speciregc sitor Web spaces For this purpose we used a crawlerthat not only visits the initial URL of the Web space but also all those belongingto the same space and that identireges as a link URLs of the type ordfA Area andFrameordm which are external in nature This yielded an initial result of 466000links which we subjected to the following procedure

Filtering We regrst eliminated links that represented advertising bannersand links to the commonest default regles (index default menu etc) so asto facilitate the matching of similar URLs This reduced the set of citedWeb space URLs to 50500 We then tested the links for validity andeliminated duplicates The result was 38900 valid URLs

Condensing This process was similar to that carried out for the regrst set ofsitor Web spaces and reduced the target URLs to 19903

JD595

562

Identiregcation of ExtremenAuml an links The regrst part of this stage was alsosimilar to that carried out for the regrst subset of sitor Web spaces Thescanning routine reduced the total of potentially relevant links(fromabout Extremadura) to 6913 Then the detailed analysis of theselinks yielded a regnal set of 1232 real URLs about Extremadura (Theother 5681 corresponded to what we came to call ordfstop-wordsordm ieURLs of Web spaces of a general character that mention Extremaduraamong their numerous resources or that cause homonym problems inidentiregcation)

Elimination of synonyms This phase was again similar to that carried outfor the regrst set of sitor Web spaces and left 1214 URLs

With these data the overall set of Web spaces to use as origins for the sitationanalysis consisted of 1963 URLs (749 from the regrst ordfsitorordm group plus 1214from the second ordfsiteeordm group) After eliminating the duplicated URLs andmaking a last pass to eliminate errors (pages under construction missing regleschange of URL etc) and frames-generated synonyms we were regnally left witha database of 1180 citor Web space URLs

Retrieval of the sitationsUp to now we have described how the sets of sitor URLs were formed For eachURL however we needed to obtain the outlinks We used the crawler toretrieve the pages and then extracted the links of the type ordfA Area and FrameordmIn the case of the regrst set (of 749 URLs) of course this process had already beencarried out in obtaining the second set (giving as an intermediate result the1214 URLs of what we termed above the ordfsiteordm group)

It should be recalled at this point that the set of sitors are treated as Webspaces all of their pages are retrieved at once although the calculations couldsubsequently be made either with Web spaces or with individual pages sinceeach linkrsquos record contained both the origin URL and the origin Web space aswell as the target URL A page was included if the regnal regle that was retrievedwas in a subdirectory of the main address of the Web space

Result of the distributions and discussionAs was described in detail in the previous section we had two slightly differentsets of data which we shall call the regrst and the second sets The second wasmore thematically complete than the regrst

The design of these two databases allowed the data to be treated with thefollowing variations

(1) Errors This characteristic allowed us to form two subsets all the links that were present only those that corresponded to URLs that could be retrieved without

error

ordfSitationordmdistributions

563

(2) Type of link We again formed two outlink subsets Only those of the type ordfA Area or Frameordm These mainly denote some

thematic relationship Links of other types such as bgsound embediframe img input meta script etc were thereby eliminated

All the links independently of type

(3) Origin This indicator allowed three variations in the form of computingthe number of sitations to a given target Web spaces that were linked to a given target Web pages that were linked to a given target links that had a given target

(4) Destination There were two variations all the targets target Web spaces locating them by grouping together the type of

link URLs according to whether the path was included in some otherURL or not

(5) ExtremenAuml an target Again there were two variations in the form ofcomputing the targets all the targets only those that corresponded to Extremadura

(6) Targets with high sitation percentages from the pages of some Webspaces Another two variations in the form of computing the links all the links only those that did not have a high percentage of their sitations

originating from the pages of certain very large Web spaces (therewere some sitor Web spaces with a great number of pages a highproportion of which had a link to the same target thereby enormouslyincreasing the number of sitations of that target)

These variations generated 192 different distributions to use in thecalculations Since the power-law based distributions are traditionallyrepresented differently (number of pages vs the logarithm of the sitationsreceived) from those based on Bradfordrsquos law (accumulated sitations receivedvs the logarithm of accumulated targets) this would mean a total of 384representations We hence considered that to present exhaustively all the workthat was performed would surpass the limits of a communication such as thepresent Furthermore since it was possible to extract common characteristicsfrom many of these distributions we shall just present a small butrepresentative sample

We shall regrstly consider how well the data regt the power law by showing aseries of plots of the general behaviour with respect to a power law distribution

JD595

564

Since this type of distribution has been demonstrated to hold for Web pageswe shall begin with them Figure 1 shows the general regt to an exponent of 21We represent the number of pages that receive a given number of links using alogarithmic scale following Broder et al (2000) although instead ofrepresenting the raw number of pages we use the fraction of pages Theplotted data correspond to the regrst set both including and excluding those thathave unrecoverable errors (broken links) One observes the general goodness ofthe regt to the exponent 21 with there being a tail similar to that obtained byBroder et al (2000) (corresponding to groups of URLs that are cited veryfrequently) although we do not regnd those workersrsquo slight initial fall in the dataA close observation of Figure 1 shows that when there are few inlinks thepoints corresponding to all the links are for ratios slightly above those for thepoints corresponding to unbroken links only This tendency is reversed as thenumber of inlinks rises This was to be expected since it indicates that brokenlinks (the only difference between the pairs of points) are cited less often Thereare no other apparent differences between the data with and without errorsThe behaviour was similar for the second set

In Figure 2 we use the second set plotting all the links and the origin pagesseparately The difference is that in the second case if there are several linkswith the same target in the same page then they only count as one This will bethe case henceforth unless explicitly stated to the contrary One sees that theredo not appear to be any major differences between the two sets of points exceptfor the effect of small reductions in the number of inlinks recorded in somecases In particular the two variants are similar in shape The behaviour wassimilar for the regrst set

In Figure 3 we have grouped the origins by Web space so that several linkswithin the same space that have the same target are only counted once Oneresult is that the number of links recorded for each destination is far less than inFigures 1 and 2 The data are from the regrst set The tail is smaller and the slopeis greater corresponding to an exponent of 28 but the regt to a power lawdistribution is just as good The behaviour was similar for the second set

In Figure 4 we have grouped the targets just as in Figure 3 we grouped theorigins according to the path of the different URLs The data are from thesecond set The regt to the ideal distribution is similar to the previous cases Thedifference between the two variants in the plot is that the second excludes somecases in which all of the pages of a large Web space link to some other Webspace (which usually corresponds to the company that designed the regrst Webspace) In this second variant there is a logical decrease in the tail for the datafrom which we eliminated the targets that were very frequently linked to byvery large Web spaces The behaviour was similar for the regrst set

To conclude with the power law representations Figure 5 shows the Webspaces as in the previous case again eliminating the targets that are veryfrequently linked to in very large Web spaces but now with the restriction of

ordfSitationordmdistributions

565

Figure 1Fraction of pages of theregrst set that receive agiven number of inlinksboth including andexcluding erroneousURLs compared with anideal power lawdistribution

JD595

566

Figure 2Fraction of pages of thesecond set that receive agiven number of inlinkscounting on the one handall the links and on the

other only the originpages for each link (in

both cases after havingeliminated the

erroneous links)

ordfSitationordmdistributions

567

Figure 3Fraction of pages of theregrst set that receive agiven number of inlinkscounting only the originWeb spaces of each linkboth including andexcluding the erroneousURLs compared with theideal power lawdistribution

JD595

568

Figure 4Fraction of Web spaces

of the second set thatreceive a given number

of inlinks both includingand excluding the group

of targets that werelinked to in a large

percentage of the pagesof some very large Web

spaces

ordfSitationordmdistributions

569

Figure 5Fraction of Web spacesthat receive a givennumber of inlinkscomparing the regrst setwith the secondincluding only targetsdealing withExtremadura andexcluding targets thatwere very frequentlylinked to in very largeWeb spaces

JD595

570

being Web spaces which deal with Extremadura We compare the data fromthe regrst set with the data from the second One observes the logical decrease inthe number of elements but also a decrease in the slope that regts best anexponent of 15

As we mentioned above Bradfordrsquos law distributions are traditionallyrepresented differently from power law distributions so the presentation ofFigures 6-10 will be slightly different from the previous regve with accumulatedsitations plotted against sitors using a logarithmic scale for the latter sincetheir number increases exponentially with the accumulated sitationsBradfordrsquos law is of course normally applied to scientiregc journals countingthe citations received by each journal These citations point from the citingarticles (which in turn belong to journals) towards the journalrsquos articles Acertain ambiguity arises in the Web analogue however When a certain work iscited several times in an article it is only counted once Hence in our case toowhen a certain target is linked to several times it should also only be countedonce The problem is though that it is not clear that the Web spaces and pagesof sitation analysis should be taken as the analogues of journals and articlesrespectively or rather that an entire Web space should be taken as theanalogue of an article We shall look at both possibilities

Figure 6 shows the results for the second set when we count only onesitation per Web space for each target We present the results for target pagesindependently after grouping the pages by Web space and after alsoeliminating the frequent links as we did in the power law case This thereforeregards the analogues of the citor articles to be Web spaces the analogues ofthe cited journals to be pages or entire Web spaces (depending on the case) Thedata do not regt Bradfordrsquos law since the curve is concave from above Hence thenumber of sitations appears to grow exponentially and indeed if a logarithmicscale is also used for the vertical axis the result is practically a straight lineLikewise the verbal statement of the law is not satisreged since the proportion1nn2 does not hold

Figure 7 shows the same three distributions for the regrst set but nowcounting sitor pages This means that we are now considering Web pages to bethe analogues of citor articles which seems more natural One sees that the regrstdistribution corresponding to pages lies in the upper part of the plot at regrstrunning coincident with the Web space distribution and then rising above it

The second distribution grouped by Web space includes points ofinmacrection but in synthesis differs from a Bradford law curve in the followingaspects it begins at a point clearly above the origin the linear portion does notbegin at the regrst point of inmacrection but later and it ends with a stronglyincreasing section

The third distribution occupies the lower part of the plot This means thateliminating the links found in a large percentage of the pages of very large Webspaces eliminates many sitations but not many Web spaces or linked pages

ordfSitationordmdistributions

571

Figure 6Accumulated sitations vsaccumulated Webspaces The data arefrom the second set Theresults are presented fortarget pagesindependently groupedby Web space andexcluding targets thatwere very frequentlylinked to in very largeWeb spaces Only sitorWeb spaces are counted

JD595

572

Figure 7Accumulated sitations vs

accumulated Webspaces The data are

from the regrst set Theresults are presented for

target pagesindependently grouped

by Web space andexcluding targets thatwere very frequently

linked to in very largeWeb spaces Sitor pages

are counted

ordfSitationordmdistributions

573

Figure 8Accumulated sitations vsaccumulated Webspaces The data arefrom the regrst set Theresults are presented fortarget pagesindependently groupedby Web space andexcluding targets thatwere very frequentlylinked to in very largeWeb spaces Sitor pagesare counted but onlytargets aboutExtremadura

JD595

574

Figure 9Accumulated sitations vs

accumulated Webspaces The data are

from the second set Theresults are presented for

target pagesindependently grouped

by Web space andexcluding targets thatwere very frequently

linked to in very largeWeb spaces Sitor pages

are counted

ordfSitationordmdistributions

575

Figure 10Accumulated sitations vsaccumulated Webspaces The data arefrom the second set Theresults are presented fortarget pagesindependently groupedby Web space andexcluding targets thatwere very frequentlylinked to in very largeWeb spaces Sitor pagesare counted but onlytargets aboutExtremadura

JD595

576

Although as this distribution is plotted together with the other distributionsits slope is less clearly perceptible one nevertheless observes that the slopeincreases continuously implying an exponential growth of the accumulatedsitations also Indeed if a logarithmic scale is also used on the vertical axis theresult is a straight line with no point of inmacrection and not regtting Bradfordrsquoslaw It is interesting that with the elimination of this set of links the dependenceof the sitations becomes exponential Again the verbal statement of the law isnot satisreged since the proportion 1nn2 does not hold

Some authors Brookes (1969) for instance consider that Bradfordrsquos law ismainly applicable in small and well-deregned areas of research in which theremust exist a strong thematic relationship between the documents In Figure 8we have therefore represented only the pages or Web spaces that are fromExtremadura (for the regrst set of data) The behaviour however is quite similarto the previous case the regrst two distributions are fairly similar to each otherwith the greatest difference being in the third distribution which is a straightline

Figures 9 and 10 show the same calculations as Figures 7 and 8 but now fordata from the second set The results are identical

ConclusionsThe present experiment focused on the distribution of citations in a closed butheterogeneous environment of thematically related Web spaces using two setsof data We studied how they regt on the one hand a power law distribution andon the other Bradfordrsquos law looking at a great many variants with a total of 384plots

Studies in the literature have shown that on a very large scale sitationsfollow a power law distribution with an exponent of 21 The present sitationdistributions from both data sets were coherent with those earlier studiesincluding in the exponent independently of whether or not the counts includedbroken links total links or only sitor pages and pages or Web spaces both asorigin and as target We showed that the distributions differed howeveraccording to which calculation procedure was used and to what reglter wasapplied to the data In particular when we considered only Web spaces asorigins instead of pages or links and pages as destinations the slope increasedto correspond approximately to an exponent of 28 When however bothdestinations and origins were Web spaces the exponent returned to the value21 Also when the destinations were thematically restricted (in which casethey practically coincided with the citors) the slope was reduced to correspondapproximately to an exponent of 15 Finally the tail in the data was reduced byeliminating the targets that were very frequently linked to in very large Webspaces

We studied the regt to Bradfordrsquos law by plotting the accumulated sitationsagainst the accumulated targets Although we presented many distributions

ordfSitationordmdistributions

577

none of them regtted the typical Bradford case When we considered Web spacesas the sitors the resulting distribution was concave from above with no pointof inmacrection and passing through the origin The other distributions in whichpages were considered as the sitors could in general be characterized as beingcurves which start noticeably above the origin and have several points ofinmacrection ( ) It was also notable regrstly that taking Web spaces asdestinations instead of pages greatly reduced the accumulated sitations andmuch of the complexity of the curve and secondly that eliminating the targetsthat were very frequently linked to in very large Web spaces sharply reducedthe accumulated sitations and the curves changed now to pass through theorigin and presented an exponential dependence on the accumulated sitations

In sum everything seems to support the observation of Kim (2000) that thereare different motivations behind the citations in scientiregc articles and the linksof the World Wide Web In this sense we conregrmed the immense inmacruence ofthematic restriction (for instance nearly all the Web spaces included linksrelated to the World Wide Web itself and its technology) and of the repetitivelinks found in most of the very large Web spaces (which are usually generatedby means of software tools that allow links to be automatically included onevery page)

References

Aguillo IF (2000) ordfIndicadores hacia una evaluacioAcircn no objetiva (cuantitativa) de sedes webordm inVII Jornadas EspanAuml olas de DocumentacioAcircn (Bilbao 19-21 de octubre de 2000) ServicioEditorial de la Universidad del PaotildeAcircs Vasco Bilbao pp 233-48

Almind TC and Ingwersen P (1997) ordfInformetric analyses on the World Wide Webmethodological approaches to `Webometricsrsquoordm Journal of Documentation Vol 53 No 4pp 404-26

Bar-Ilan J (1997) ordfThe `mad cow diseasersquo Usenet newsgroups and bibliometric lawsordmScientometrics Vol 39 No 1 pp 29-55

Bar-Ilan J (2001) ordfData collection methods on the Web for informetrics purposes a review andanalysisordm Scientometrics Vol 50 No 1 pp 7-32

BarabaAcircsi AL and Albert R (1999) ordfEmergence of scaling in random networksordm ScienceVol 286 pp 509-12

BjoEgraverneborn L and Ingwersen P (2001) ordfPerspectives of webometricsordm Scientometrics Vol 50No 1 pp 65-82

Bradford SC (1934) ordfSourcesof information of speciregc subjectsordm Engineering Vol 137 pp 85-6

Brin S and Page L (1998) ordfThe anatomy of a large-scale hypertextual web search engineordmComputer Networks and ISDN Systems Vol 30 pp 107-17

Broder A et al (2000) ordfGraph structure in the Webordm Computer Networks and ISDN SystemsVol 33 No 1-6 pp 309-20

BrookesBC (1969) ordfBradfordrsquos law and the bibliography of scienceordm Nature Vol 224 pp 953-6

Codina L (2000) ordfParaAcircmetros e indicadores de calidad para la evaluacioAcircn de recursos digitalesordmin VII Jornadas EspanAuml olas de DocumentacioAcircn (Bilbao 19-21 de octubre de 2000) ServicioEditorial de la Universidad del PaotildeAcircs Vasco Bilbao pp 135-44

JD595

578

Correa-Uribe G (1999) Colombia Conectada al Mundo Sitios Web Colombianos Universidad deAntioquotildeAcirca AntioquotildeAcirca

Cronin B (2001) ordfBibliometrics and beyond some thoughts on web-based citation analysisordmJournal of Information Science Vol 27 No 1 pp 1-7

Cui L (1999) ordfRating health web sites using the principles of citation analysis a bibliometricapproachordm Journal of Medical Internet Research Vol 1 No e4 avalaible atwwwjmirorg19991e4indexhtm (accessed 18 August 2001)

Cybermetrics International Journal of Scientometrics Informetrics and Bibliometrics (2002)available at wwwcindoccsicescybermetrics (accessed 11 August 2002)

Egghe L (2000) ordfNew informetric aspects of the Internet some remacrections plusmn many problemsordmJournal of Information Science Vol 26 No 5 pp 329-35

Ferreiro-AlaAcircez L (1981) ordfAnaAcirclisis de referencias y caracterotildeAcircsticas bibliomeAcirctricas de losconjuntos de revistas nuclearesordm Revista EspanAuml ola de DocumentacioAcircn CientotildeAcircregca Vol 4No 3 pp 181-98

Gorbea-Portal S (1996) El Modelo MatemaAcirctico de Bradford Su AplicacioAcircn a las RevistasLatinoamericanas de las Ciencias BibliotecoloAcircgicas y de la InformacioAcircn UNAM CentroUniversitario de Investigaciones BibliotecoloAcircgicas MeAcircxico

Gupta DK (1991) ordfApplication of Bradfordrsquos law to citation data of Ethiopian medicaljournalsordm Annals of Library Science and Documentation Vol 38 No 3 pp 85-98

Harter SP and Ford CE (2000) ordfWeb-based analyses of e-journal impact approachesproblems and issuesordm Library Science With a Slant to Documentation and InformationStudies Vol 51 No 13 pp 1159-76

Houston W (1983) ordfThe application of bibliometrics to veterinary science primary literatureordmQuarterly Bulletin of International Association of Agricultural Information SpecialistsVol 28 No 1 pp 6-13

Internet Software Consortium (2002) ordfInternet domain surveyordm available at wwwiscorg(accessed 4 November 2002)

JimeAcircnez-Piano M (2001) ordfEvaluacioAcircn de sedes webordm Revista EspanAuml ola de DocumentacioAcircnCientotildeAcircregca Vol 24 No 4 pp 405-32

Kim HJ (2000) ordfMotivations for hyperlinking in scholarly electronic articles a qualitativestudyordm Journal of the American Society for Information Science Vol 51 No 10 pp 887-99

Kumar R et al (2001) ordfTrawling the Web for emerging cyber-communitiesordm in Mendelzon A etal (Eds) Proceedings of the 8th International World Wide Web Conference (TorontoCanadaAcirc May 11-14 1999) available at www8orgw8-papers4a-search-miningtrawlingtrawlinghtml (accessed 2 October 2002) Elsevier Amsterdam

Lal A and Panda KC (1999) ordfBradfordrsquos law and its application to bibliographical data ofplant pathology dissertations an analytical approachordm Library Science With a Slant toDocumentation and Information Studies Vol 36 No 3 pp 193-206

Larson RR (1996) ordfBibliometrics of the World Wide Web and exploratory analysis of theintellectual structure of cyberspaceordm in Hardin S (Ed) Proceedings of the 59th AnnualMeeting of the American Society for Information Science (Baltimore Maryland 1996)Information Today Medford NJ pp 71-8 available at httpsherlockberkeleyeduasis96asis96html (accessed 14 October 2000)

McKiernan G (1996) ordfCitedSites(sm) citation indexing of Web resourcesordm available atwwwpubliciastateedu CYBERSTACKSCitedhtm (accessed 24 February 2000)

Mubeen MA (1996) ordfCitation analysis of doctoral dissertations in chemistryordm Annals of LibraryScience and Documentation Vol 43 No 2 pp 48-58

ordfSitationordmdistributions

579

Price DJ (1970) ordfCitation measures of hard science soft science technology and non-scienceordm inNelson CC and Pollock DE (Eds) Communication Among Scientists and Engineers DCHealth and Co Lexington MA pp 3-22

Reyes-BarragaAcircn MJ et al (2000) ordfRevistas cientotildeAcircregcas determinacioAcircn de necesidades y usosordmRevista EspanAuml ola de DocumentacioAcircn CientotildeAcircregca Vol 23 No 4 pp 417-36

RodrotildeAcircguez i GairotildeAcircn JM (1997) ordfValoracioAcircn del impacto de la informacioAcircn en Internet Altavistael `Citation Indexrsquo de la redordm Revista EspanAuml ola de DocumentacioAcircn CientotildeAcircregca Vol 20 No 2pp 175-81

Rousseau R (1997) ordfCitations an exploratory studyordm Cybermetrics International Journal ofScientometrics Informetrics and Bibliometrics Vol 1 p 1 available atwwwcindoccsicescybermetricsarticlesv1i1p1html (accessed 5 September 2000)

Smith AG (1996) ordfCriteria for evaluation of Internet information resourcesordm available atwwwvuwacnz agsmithevalnindexhtm (accessed 25 March 2000)

Smith AG (1997) ordfTesting the surf criteria for evaluating Internet information resourcesordm ThePublic-Access Computer Systems Review Vol 8 No 3 available at httpinfolibuheduprv8n3smit8n3html (accessed 28 February 2002)

Smith AG (1999) ordfThe impact of web sites a comparison betwen Australasia and LatinAmericaordm available at wwwvuwacnz agsmithpublnsaustlat (accessed 12 May2001)

Smith AG (2001) ordfApplying evaluation criteria to New Zealand government websitesordmInternational Journal of Information Management Vol 21 No 2 pp 137-49

Tague-Sutcliffe J (1992) ordfAn introduction to informetricsordm Information Processing andManagement Vol 28 No 1 pp 1-3

Tillman HN (2000) ordfEvaluating quality on the Netordm available atwwwtiacnetusershoperegndqualhtml (accessed 5 June 2000)

van Raan AFJ (2001) ordfBibliometrics and Internet some observations and expectationsordmScientometrics Vol 50 No 1 pp 59-63

Vreeland RC (2000) ordfLaw libraries in hyperspacea citation analysis of World Wide Web sitesordmLaw Library Journal Vol 92 No 1 pp 9-25

Zhang P and von Dran GM (2000) ordfSatisregers and dissatisregers a two-factor model for websitedesign and evaluationordm Journal of the American Society for Information Science Vol 51No 14 pp 1253-68

JD595

580

Web spaces and retrieves both the number of associated pages and thenumber of characters In order to identify ExtremenAumlan Web space URLswe added to the scanning routine 45 patterns representative ofExtremadura regltering out URLs that did not include any of them intheir content A close analysis of the results showed that we were alsoretrieving some URLs of Web spaces corresponding to nationalinstitutions that mention Extremadura on their Webs (ministriesuniversities publishing houses etc) but that are not speciregc to thetopic After eliminating these from the set the 1047 URLs were reducedto 755

Elimination of synonyms Synonyms refer to downloaded pages that wereidentical but whose URLs were different This could arise in one of twoways from automatic server redirection or from mirroring Theiridentiregcation was far from straightforward While redirections could beresolved using the URL of the regnal regle or the problem of different namesfor the same machine by using the IP address there still remained theproblem of mirrors of Web sites on different machines (with different IPs)or of Web sites that are lodged on more than one machine accessing oneor the other according to the demand We therefore decided to use literalcomparison of the content We used a mixed strategy for the regnalelimination of synonyms automatic comparison backed up by humancomparison After eliminating the synonyms we had a regnal set of URLsof 749 sitor Web spaces

The second set of sitor Web spacesIn parallel with the process of regltering the 1850 sitor Web spaces we decided tofollow the http-protocol-preregxed outlinks found in each sitor Web space as amethod to mine for potential URLs of Web spaces about Extremadura to add tothe foregoing regltered subset of sitor URLs and thus obtain our regnal set ofExtremadura-speciregc sitor Web spaces For this purpose we used a crawlerthat not only visits the initial URL of the Web space but also all those belongingto the same space and that identireges as a link URLs of the type ordfA Area andFrameordm which are external in nature This yielded an initial result of 466000links which we subjected to the following procedure

Filtering We regrst eliminated links that represented advertising bannersand links to the commonest default regles (index default menu etc) so asto facilitate the matching of similar URLs This reduced the set of citedWeb space URLs to 50500 We then tested the links for validity andeliminated duplicates The result was 38900 valid URLs

Condensing This process was similar to that carried out for the regrst set ofsitor Web spaces and reduced the target URLs to 19903

JD595

562

Identiregcation of ExtremenAuml an links The regrst part of this stage was alsosimilar to that carried out for the regrst subset of sitor Web spaces Thescanning routine reduced the total of potentially relevant links(fromabout Extremadura) to 6913 Then the detailed analysis of theselinks yielded a regnal set of 1232 real URLs about Extremadura (Theother 5681 corresponded to what we came to call ordfstop-wordsordm ieURLs of Web spaces of a general character that mention Extremaduraamong their numerous resources or that cause homonym problems inidentiregcation)

Elimination of synonyms This phase was again similar to that carried outfor the regrst set of sitor Web spaces and left 1214 URLs

With these data the overall set of Web spaces to use as origins for the sitationanalysis consisted of 1963 URLs (749 from the regrst ordfsitorordm group plus 1214from the second ordfsiteeordm group) After eliminating the duplicated URLs andmaking a last pass to eliminate errors (pages under construction missing regleschange of URL etc) and frames-generated synonyms we were regnally left witha database of 1180 citor Web space URLs

Retrieval of the sitationsUp to now we have described how the sets of sitor URLs were formed For eachURL however we needed to obtain the outlinks We used the crawler toretrieve the pages and then extracted the links of the type ordfA Area and FrameordmIn the case of the regrst set (of 749 URLs) of course this process had already beencarried out in obtaining the second set (giving as an intermediate result the1214 URLs of what we termed above the ordfsiteordm group)

It should be recalled at this point that the set of sitors are treated as Webspaces all of their pages are retrieved at once although the calculations couldsubsequently be made either with Web spaces or with individual pages sinceeach linkrsquos record contained both the origin URL and the origin Web space aswell as the target URL A page was included if the regnal regle that was retrievedwas in a subdirectory of the main address of the Web space

Result of the distributions and discussionAs was described in detail in the previous section we had two slightly differentsets of data which we shall call the regrst and the second sets The second wasmore thematically complete than the regrst

The design of these two databases allowed the data to be treated with thefollowing variations

(1) Errors This characteristic allowed us to form two subsets all the links that were present only those that corresponded to URLs that could be retrieved without

error

ordfSitationordmdistributions

563

(2) Type of link We again formed two outlink subsets Only those of the type ordfA Area or Frameordm These mainly denote some

thematic relationship Links of other types such as bgsound embediframe img input meta script etc were thereby eliminated

All the links independently of type

(3) Origin This indicator allowed three variations in the form of computingthe number of sitations to a given target Web spaces that were linked to a given target Web pages that were linked to a given target links that had a given target

(4) Destination There were two variations all the targets target Web spaces locating them by grouping together the type of

link URLs according to whether the path was included in some otherURL or not

(5) ExtremenAuml an target Again there were two variations in the form ofcomputing the targets all the targets only those that corresponded to Extremadura

(6) Targets with high sitation percentages from the pages of some Webspaces Another two variations in the form of computing the links all the links only those that did not have a high percentage of their sitations

originating from the pages of certain very large Web spaces (therewere some sitor Web spaces with a great number of pages a highproportion of which had a link to the same target thereby enormouslyincreasing the number of sitations of that target)

These variations generated 192 different distributions to use in thecalculations Since the power-law based distributions are traditionallyrepresented differently (number of pages vs the logarithm of the sitationsreceived) from those based on Bradfordrsquos law (accumulated sitations receivedvs the logarithm of accumulated targets) this would mean a total of 384representations We hence considered that to present exhaustively all the workthat was performed would surpass the limits of a communication such as thepresent Furthermore since it was possible to extract common characteristicsfrom many of these distributions we shall just present a small butrepresentative sample

We shall regrstly consider how well the data regt the power law by showing aseries of plots of the general behaviour with respect to a power law distribution

JD595

564

Since this type of distribution has been demonstrated to hold for Web pageswe shall begin with them Figure 1 shows the general regt to an exponent of 21We represent the number of pages that receive a given number of links using alogarithmic scale following Broder et al (2000) although instead ofrepresenting the raw number of pages we use the fraction of pages Theplotted data correspond to the regrst set both including and excluding those thathave unrecoverable errors (broken links) One observes the general goodness ofthe regt to the exponent 21 with there being a tail similar to that obtained byBroder et al (2000) (corresponding to groups of URLs that are cited veryfrequently) although we do not regnd those workersrsquo slight initial fall in the dataA close observation of Figure 1 shows that when there are few inlinks thepoints corresponding to all the links are for ratios slightly above those for thepoints corresponding to unbroken links only This tendency is reversed as thenumber of inlinks rises This was to be expected since it indicates that brokenlinks (the only difference between the pairs of points) are cited less often Thereare no other apparent differences between the data with and without errorsThe behaviour was similar for the second set

In Figure 2 we use the second set plotting all the links and the origin pagesseparately The difference is that in the second case if there are several linkswith the same target in the same page then they only count as one This will bethe case henceforth unless explicitly stated to the contrary One sees that theredo not appear to be any major differences between the two sets of points exceptfor the effect of small reductions in the number of inlinks recorded in somecases In particular the two variants are similar in shape The behaviour wassimilar for the regrst set

In Figure 3 we have grouped the origins by Web space so that several linkswithin the same space that have the same target are only counted once Oneresult is that the number of links recorded for each destination is far less than inFigures 1 and 2 The data are from the regrst set The tail is smaller and the slopeis greater corresponding to an exponent of 28 but the regt to a power lawdistribution is just as good The behaviour was similar for the second set

In Figure 4 we have grouped the targets just as in Figure 3 we grouped theorigins according to the path of the different URLs The data are from thesecond set The regt to the ideal distribution is similar to the previous cases Thedifference between the two variants in the plot is that the second excludes somecases in which all of the pages of a large Web space link to some other Webspace (which usually corresponds to the company that designed the regrst Webspace) In this second variant there is a logical decrease in the tail for the datafrom which we eliminated the targets that were very frequently linked to byvery large Web spaces The behaviour was similar for the regrst set

To conclude with the power law representations Figure 5 shows the Webspaces as in the previous case again eliminating the targets that are veryfrequently linked to in very large Web spaces but now with the restriction of

ordfSitationordmdistributions

565

Figure 1Fraction of pages of theregrst set that receive agiven number of inlinksboth including andexcluding erroneousURLs compared with anideal power lawdistribution

JD595

566

Figure 2Fraction of pages of thesecond set that receive agiven number of inlinkscounting on the one handall the links and on the

other only the originpages for each link (in

both cases after havingeliminated the

erroneous links)

ordfSitationordmdistributions

567

Figure 3Fraction of pages of theregrst set that receive agiven number of inlinkscounting only the originWeb spaces of each linkboth including andexcluding the erroneousURLs compared with theideal power lawdistribution

JD595

568

Figure 4Fraction of Web spaces

of the second set thatreceive a given number

of inlinks both includingand excluding the group

of targets that werelinked to in a large

percentage of the pagesof some very large Web

spaces

ordfSitationordmdistributions

569

Figure 5Fraction of Web spacesthat receive a givennumber of inlinkscomparing the regrst setwith the secondincluding only targetsdealing withExtremadura andexcluding targets thatwere very frequentlylinked to in very largeWeb spaces

JD595

570

being Web spaces which deal with Extremadura We compare the data fromthe regrst set with the data from the second One observes the logical decrease inthe number of elements but also a decrease in the slope that regts best anexponent of 15

As we mentioned above Bradfordrsquos law distributions are traditionallyrepresented differently from power law distributions so the presentation ofFigures 6-10 will be slightly different from the previous regve with accumulatedsitations plotted against sitors using a logarithmic scale for the latter sincetheir number increases exponentially with the accumulated sitationsBradfordrsquos law is of course normally applied to scientiregc journals countingthe citations received by each journal These citations point from the citingarticles (which in turn belong to journals) towards the journalrsquos articles Acertain ambiguity arises in the Web analogue however When a certain work iscited several times in an article it is only counted once Hence in our case toowhen a certain target is linked to several times it should also only be countedonce The problem is though that it is not clear that the Web spaces and pagesof sitation analysis should be taken as the analogues of journals and articlesrespectively or rather that an entire Web space should be taken as theanalogue of an article We shall look at both possibilities

Figure 6 shows the results for the second set when we count only onesitation per Web space for each target We present the results for target pagesindependently after grouping the pages by Web space and after alsoeliminating the frequent links as we did in the power law case This thereforeregards the analogues of the citor articles to be Web spaces the analogues ofthe cited journals to be pages or entire Web spaces (depending on the case) Thedata do not regt Bradfordrsquos law since the curve is concave from above Hence thenumber of sitations appears to grow exponentially and indeed if a logarithmicscale is also used for the vertical axis the result is practically a straight lineLikewise the verbal statement of the law is not satisreged since the proportion1nn2 does not hold

Figure 7 shows the same three distributions for the regrst set but nowcounting sitor pages This means that we are now considering Web pages to bethe analogues of citor articles which seems more natural One sees that the regrstdistribution corresponding to pages lies in the upper part of the plot at regrstrunning coincident with the Web space distribution and then rising above it

The second distribution grouped by Web space includes points ofinmacrection but in synthesis differs from a Bradford law curve in the followingaspects it begins at a point clearly above the origin the linear portion does notbegin at the regrst point of inmacrection but later and it ends with a stronglyincreasing section

The third distribution occupies the lower part of the plot This means thateliminating the links found in a large percentage of the pages of very large Webspaces eliminates many sitations but not many Web spaces or linked pages

ordfSitationordmdistributions

571

Figure 6Accumulated sitations vsaccumulated Webspaces The data arefrom the second set Theresults are presented fortarget pagesindependently groupedby Web space andexcluding targets thatwere very frequentlylinked to in very largeWeb spaces Only sitorWeb spaces are counted

JD595

572

Figure 7Accumulated sitations vs

accumulated Webspaces The data are

from the regrst set Theresults are presented for

target pagesindependently grouped

by Web space andexcluding targets thatwere very frequently

linked to in very largeWeb spaces Sitor pages

are counted

ordfSitationordmdistributions

573

Figure 8Accumulated sitations vsaccumulated Webspaces The data arefrom the regrst set Theresults are presented fortarget pagesindependently groupedby Web space andexcluding targets thatwere very frequentlylinked to in very largeWeb spaces Sitor pagesare counted but onlytargets aboutExtremadura

JD595

574

Figure 9Accumulated sitations vs

accumulated Webspaces The data are

from the second set Theresults are presented for

target pagesindependently grouped

by Web space andexcluding targets thatwere very frequently

linked to in very largeWeb spaces Sitor pages

are counted

ordfSitationordmdistributions

575

Figure 10Accumulated sitations vsaccumulated Webspaces The data arefrom the second set Theresults are presented fortarget pagesindependently groupedby Web space andexcluding targets thatwere very frequentlylinked to in very largeWeb spaces Sitor pagesare counted but onlytargets aboutExtremadura

JD595

576

Although as this distribution is plotted together with the other distributionsits slope is less clearly perceptible one nevertheless observes that the slopeincreases continuously implying an exponential growth of the accumulatedsitations also Indeed if a logarithmic scale is also used on the vertical axis theresult is a straight line with no point of inmacrection and not regtting Bradfordrsquoslaw It is interesting that with the elimination of this set of links the dependenceof the sitations becomes exponential Again the verbal statement of the law isnot satisreged since the proportion 1nn2 does not hold

Some authors Brookes (1969) for instance consider that Bradfordrsquos law ismainly applicable in small and well-deregned areas of research in which theremust exist a strong thematic relationship between the documents In Figure 8we have therefore represented only the pages or Web spaces that are fromExtremadura (for the regrst set of data) The behaviour however is quite similarto the previous case the regrst two distributions are fairly similar to each otherwith the greatest difference being in the third distribution which is a straightline

Figures 9 and 10 show the same calculations as Figures 7 and 8 but now fordata from the second set The results are identical

ConclusionsThe present experiment focused on the distribution of citations in a closed butheterogeneous environment of thematically related Web spaces using two setsof data We studied how they regt on the one hand a power law distribution andon the other Bradfordrsquos law looking at a great many variants with a total of 384plots

Studies in the literature have shown that on a very large scale sitationsfollow a power law distribution with an exponent of 21 The present sitationdistributions from both data sets were coherent with those earlier studiesincluding in the exponent independently of whether or not the counts includedbroken links total links or only sitor pages and pages or Web spaces both asorigin and as target We showed that the distributions differed howeveraccording to which calculation procedure was used and to what reglter wasapplied to the data In particular when we considered only Web spaces asorigins instead of pages or links and pages as destinations the slope increasedto correspond approximately to an exponent of 28 When however bothdestinations and origins were Web spaces the exponent returned to the value21 Also when the destinations were thematically restricted (in which casethey practically coincided with the citors) the slope was reduced to correspondapproximately to an exponent of 15 Finally the tail in the data was reduced byeliminating the targets that were very frequently linked to in very large Webspaces

We studied the regt to Bradfordrsquos law by plotting the accumulated sitationsagainst the accumulated targets Although we presented many distributions

ordfSitationordmdistributions

577

none of them regtted the typical Bradford case When we considered Web spacesas the sitors the resulting distribution was concave from above with no pointof inmacrection and passing through the origin The other distributions in whichpages were considered as the sitors could in general be characterized as beingcurves which start noticeably above the origin and have several points ofinmacrection ( ) It was also notable regrstly that taking Web spaces asdestinations instead of pages greatly reduced the accumulated sitations andmuch of the complexity of the curve and secondly that eliminating the targetsthat were very frequently linked to in very large Web spaces sharply reducedthe accumulated sitations and the curves changed now to pass through theorigin and presented an exponential dependence on the accumulated sitations

In sum everything seems to support the observation of Kim (2000) that thereare different motivations behind the citations in scientiregc articles and the linksof the World Wide Web In this sense we conregrmed the immense inmacruence ofthematic restriction (for instance nearly all the Web spaces included linksrelated to the World Wide Web itself and its technology) and of the repetitivelinks found in most of the very large Web spaces (which are usually generatedby means of software tools that allow links to be automatically included onevery page)

References

Aguillo IF (2000) ordfIndicadores hacia una evaluacioAcircn no objetiva (cuantitativa) de sedes webordm inVII Jornadas EspanAuml olas de DocumentacioAcircn (Bilbao 19-21 de octubre de 2000) ServicioEditorial de la Universidad del PaotildeAcircs Vasco Bilbao pp 233-48

Almind TC and Ingwersen P (1997) ordfInformetric analyses on the World Wide Webmethodological approaches to `Webometricsrsquoordm Journal of Documentation Vol 53 No 4pp 404-26

Bar-Ilan J (1997) ordfThe `mad cow diseasersquo Usenet newsgroups and bibliometric lawsordmScientometrics Vol 39 No 1 pp 29-55

Bar-Ilan J (2001) ordfData collection methods on the Web for informetrics purposes a review andanalysisordm Scientometrics Vol 50 No 1 pp 7-32

BarabaAcircsi AL and Albert R (1999) ordfEmergence of scaling in random networksordm ScienceVol 286 pp 509-12

BjoEgraverneborn L and Ingwersen P (2001) ordfPerspectives of webometricsordm Scientometrics Vol 50No 1 pp 65-82

Bradford SC (1934) ordfSourcesof information of speciregc subjectsordm Engineering Vol 137 pp 85-6

Brin S and Page L (1998) ordfThe anatomy of a large-scale hypertextual web search engineordmComputer Networks and ISDN Systems Vol 30 pp 107-17

Broder A et al (2000) ordfGraph structure in the Webordm Computer Networks and ISDN SystemsVol 33 No 1-6 pp 309-20

BrookesBC (1969) ordfBradfordrsquos law and the bibliography of scienceordm Nature Vol 224 pp 953-6

Codina L (2000) ordfParaAcircmetros e indicadores de calidad para la evaluacioAcircn de recursos digitalesordmin VII Jornadas EspanAuml olas de DocumentacioAcircn (Bilbao 19-21 de octubre de 2000) ServicioEditorial de la Universidad del PaotildeAcircs Vasco Bilbao pp 135-44

JD595

578

Correa-Uribe G (1999) Colombia Conectada al Mundo Sitios Web Colombianos Universidad deAntioquotildeAcirca AntioquotildeAcirca

Cronin B (2001) ordfBibliometrics and beyond some thoughts on web-based citation analysisordmJournal of Information Science Vol 27 No 1 pp 1-7

Cui L (1999) ordfRating health web sites using the principles of citation analysis a bibliometricapproachordm Journal of Medical Internet Research Vol 1 No e4 avalaible atwwwjmirorg19991e4indexhtm (accessed 18 August 2001)

Cybermetrics International Journal of Scientometrics Informetrics and Bibliometrics (2002)available at wwwcindoccsicescybermetrics (accessed 11 August 2002)

Egghe L (2000) ordfNew informetric aspects of the Internet some remacrections plusmn many problemsordmJournal of Information Science Vol 26 No 5 pp 329-35

Ferreiro-AlaAcircez L (1981) ordfAnaAcirclisis de referencias y caracterotildeAcircsticas bibliomeAcirctricas de losconjuntos de revistas nuclearesordm Revista EspanAuml ola de DocumentacioAcircn CientotildeAcircregca Vol 4No 3 pp 181-98

Gorbea-Portal S (1996) El Modelo MatemaAcirctico de Bradford Su AplicacioAcircn a las RevistasLatinoamericanas de las Ciencias BibliotecoloAcircgicas y de la InformacioAcircn UNAM CentroUniversitario de Investigaciones BibliotecoloAcircgicas MeAcircxico

Gupta DK (1991) ordfApplication of Bradfordrsquos law to citation data of Ethiopian medicaljournalsordm Annals of Library Science and Documentation Vol 38 No 3 pp 85-98

Harter SP and Ford CE (2000) ordfWeb-based analyses of e-journal impact approachesproblems and issuesordm Library Science With a Slant to Documentation and InformationStudies Vol 51 No 13 pp 1159-76

Houston W (1983) ordfThe application of bibliometrics to veterinary science primary literatureordmQuarterly Bulletin of International Association of Agricultural Information SpecialistsVol 28 No 1 pp 6-13

Internet Software Consortium (2002) ordfInternet domain surveyordm available at wwwiscorg(accessed 4 November 2002)

JimeAcircnez-Piano M (2001) ordfEvaluacioAcircn de sedes webordm Revista EspanAuml ola de DocumentacioAcircnCientotildeAcircregca Vol 24 No 4 pp 405-32

Kim HJ (2000) ordfMotivations for hyperlinking in scholarly electronic articles a qualitativestudyordm Journal of the American Society for Information Science Vol 51 No 10 pp 887-99

Kumar R et al (2001) ordfTrawling the Web for emerging cyber-communitiesordm in Mendelzon A etal (Eds) Proceedings of the 8th International World Wide Web Conference (TorontoCanadaAcirc May 11-14 1999) available at www8orgw8-papers4a-search-miningtrawlingtrawlinghtml (accessed 2 October 2002) Elsevier Amsterdam

Lal A and Panda KC (1999) ordfBradfordrsquos law and its application to bibliographical data ofplant pathology dissertations an analytical approachordm Library Science With a Slant toDocumentation and Information Studies Vol 36 No 3 pp 193-206

Larson RR (1996) ordfBibliometrics of the World Wide Web and exploratory analysis of theintellectual structure of cyberspaceordm in Hardin S (Ed) Proceedings of the 59th AnnualMeeting of the American Society for Information Science (Baltimore Maryland 1996)Information Today Medford NJ pp 71-8 available at httpsherlockberkeleyeduasis96asis96html (accessed 14 October 2000)

McKiernan G (1996) ordfCitedSites(sm) citation indexing of Web resourcesordm available atwwwpubliciastateedu CYBERSTACKSCitedhtm (accessed 24 February 2000)

Mubeen MA (1996) ordfCitation analysis of doctoral dissertations in chemistryordm Annals of LibraryScience and Documentation Vol 43 No 2 pp 48-58

ordfSitationordmdistributions

579

Price DJ (1970) ordfCitation measures of hard science soft science technology and non-scienceordm inNelson CC and Pollock DE (Eds) Communication Among Scientists and Engineers DCHealth and Co Lexington MA pp 3-22

Reyes-BarragaAcircn MJ et al (2000) ordfRevistas cientotildeAcircregcas determinacioAcircn de necesidades y usosordmRevista EspanAuml ola de DocumentacioAcircn CientotildeAcircregca Vol 23 No 4 pp 417-36

RodrotildeAcircguez i GairotildeAcircn JM (1997) ordfValoracioAcircn del impacto de la informacioAcircn en Internet Altavistael `Citation Indexrsquo de la redordm Revista EspanAuml ola de DocumentacioAcircn CientotildeAcircregca Vol 20 No 2pp 175-81

Rousseau R (1997) ordfCitations an exploratory studyordm Cybermetrics International Journal ofScientometrics Informetrics and Bibliometrics Vol 1 p 1 available atwwwcindoccsicescybermetricsarticlesv1i1p1html (accessed 5 September 2000)

Smith AG (1996) ordfCriteria for evaluation of Internet information resourcesordm available atwwwvuwacnz agsmithevalnindexhtm (accessed 25 March 2000)

Smith AG (1997) ordfTesting the surf criteria for evaluating Internet information resourcesordm ThePublic-Access Computer Systems Review Vol 8 No 3 available at httpinfolibuheduprv8n3smit8n3html (accessed 28 February 2002)

Smith AG (1999) ordfThe impact of web sites a comparison betwen Australasia and LatinAmericaordm available at wwwvuwacnz agsmithpublnsaustlat (accessed 12 May2001)

Smith AG (2001) ordfApplying evaluation criteria to New Zealand government websitesordmInternational Journal of Information Management Vol 21 No 2 pp 137-49

Tague-Sutcliffe J (1992) ordfAn introduction to informetricsordm Information Processing andManagement Vol 28 No 1 pp 1-3

Tillman HN (2000) ordfEvaluating quality on the Netordm available atwwwtiacnetusershoperegndqualhtml (accessed 5 June 2000)

van Raan AFJ (2001) ordfBibliometrics and Internet some observations and expectationsordmScientometrics Vol 50 No 1 pp 59-63

Vreeland RC (2000) ordfLaw libraries in hyperspacea citation analysis of World Wide Web sitesordmLaw Library Journal Vol 92 No 1 pp 9-25

Zhang P and von Dran GM (2000) ordfSatisregers and dissatisregers a two-factor model for websitedesign and evaluationordm Journal of the American Society for Information Science Vol 51No 14 pp 1253-68

JD595

580

Identiregcation of ExtremenAuml an links The regrst part of this stage was alsosimilar to that carried out for the regrst subset of sitor Web spaces Thescanning routine reduced the total of potentially relevant links(fromabout Extremadura) to 6913 Then the detailed analysis of theselinks yielded a regnal set of 1232 real URLs about Extremadura (Theother 5681 corresponded to what we came to call ordfstop-wordsordm ieURLs of Web spaces of a general character that mention Extremaduraamong their numerous resources or that cause homonym problems inidentiregcation)

Elimination of synonyms This phase was again similar to that carried outfor the regrst set of sitor Web spaces and left 1214 URLs

With these data the overall set of Web spaces to use as origins for the sitationanalysis consisted of 1963 URLs (749 from the regrst ordfsitorordm group plus 1214from the second ordfsiteeordm group) After eliminating the duplicated URLs andmaking a last pass to eliminate errors (pages under construction missing regleschange of URL etc) and frames-generated synonyms we were regnally left witha database of 1180 citor Web space URLs

Retrieval of the sitationsUp to now we have described how the sets of sitor URLs were formed For eachURL however we needed to obtain the outlinks We used the crawler toretrieve the pages and then extracted the links of the type ordfA Area and FrameordmIn the case of the regrst set (of 749 URLs) of course this process had already beencarried out in obtaining the second set (giving as an intermediate result the1214 URLs of what we termed above the ordfsiteordm group)

It should be recalled at this point that the set of sitors are treated as Webspaces all of their pages are retrieved at once although the calculations couldsubsequently be made either with Web spaces or with individual pages sinceeach linkrsquos record contained both the origin URL and the origin Web space aswell as the target URL A page was included if the regnal regle that was retrievedwas in a subdirectory of the main address of the Web space

Result of the distributions and discussionAs was described in detail in the previous section we had two slightly differentsets of data which we shall call the regrst and the second sets The second wasmore thematically complete than the regrst

The design of these two databases allowed the data to be treated with thefollowing variations

(1) Errors This characteristic allowed us to form two subsets all the links that were present only those that corresponded to URLs that could be retrieved without

error

ordfSitationordmdistributions

563

(2) Type of link We again formed two outlink subsets Only those of the type ordfA Area or Frameordm These mainly denote some

thematic relationship Links of other types such as bgsound embediframe img input meta script etc were thereby eliminated

All the links independently of type

(3) Origin This indicator allowed three variations in the form of computingthe number of sitations to a given target Web spaces that were linked to a given target Web pages that were linked to a given target links that had a given target

(4) Destination There were two variations all the targets target Web spaces locating them by grouping together the type of

link URLs according to whether the path was included in some otherURL or not

(5) ExtremenAuml an target Again there were two variations in the form ofcomputing the targets all the targets only those that corresponded to Extremadura

(6) Targets with high sitation percentages from the pages of some Webspaces Another two variations in the form of computing the links all the links only those that did not have a high percentage of their sitations

originating from the pages of certain very large Web spaces (therewere some sitor Web spaces with a great number of pages a highproportion of which had a link to the same target thereby enormouslyincreasing the number of sitations of that target)

These variations generated 192 different distributions to use in thecalculations Since the power-law based distributions are traditionallyrepresented differently (number of pages vs the logarithm of the sitationsreceived) from those based on Bradfordrsquos law (accumulated sitations receivedvs the logarithm of accumulated targets) this would mean a total of 384representations We hence considered that to present exhaustively all the workthat was performed would surpass the limits of a communication such as thepresent Furthermore since it was possible to extract common characteristicsfrom many of these distributions we shall just present a small butrepresentative sample

We shall regrstly consider how well the data regt the power law by showing aseries of plots of the general behaviour with respect to a power law distribution

JD595

564

Since this type of distribution has been demonstrated to hold for Web pageswe shall begin with them Figure 1 shows the general regt to an exponent of 21We represent the number of pages that receive a given number of links using alogarithmic scale following Broder et al (2000) although instead ofrepresenting the raw number of pages we use the fraction of pages Theplotted data correspond to the regrst set both including and excluding those thathave unrecoverable errors (broken links) One observes the general goodness ofthe regt to the exponent 21 with there being a tail similar to that obtained byBroder et al (2000) (corresponding to groups of URLs that are cited veryfrequently) although we do not regnd those workersrsquo slight initial fall in the dataA close observation of Figure 1 shows that when there are few inlinks thepoints corresponding to all the links are for ratios slightly above those for thepoints corresponding to unbroken links only This tendency is reversed as thenumber of inlinks rises This was to be expected since it indicates that brokenlinks (the only difference between the pairs of points) are cited less often Thereare no other apparent differences between the data with and without errorsThe behaviour was similar for the second set

In Figure 2 we use the second set plotting all the links and the origin pagesseparately The difference is that in the second case if there are several linkswith the same target in the same page then they only count as one This will bethe case henceforth unless explicitly stated to the contrary One sees that theredo not appear to be any major differences between the two sets of points exceptfor the effect of small reductions in the number of inlinks recorded in somecases In particular the two variants are similar in shape The behaviour wassimilar for the regrst set

In Figure 3 we have grouped the origins by Web space so that several linkswithin the same space that have the same target are only counted once Oneresult is that the number of links recorded for each destination is far less than inFigures 1 and 2 The data are from the regrst set The tail is smaller and the slopeis greater corresponding to an exponent of 28 but the regt to a power lawdistribution is just as good The behaviour was similar for the second set

In Figure 4 we have grouped the targets just as in Figure 3 we grouped theorigins according to the path of the different URLs The data are from thesecond set The regt to the ideal distribution is similar to the previous cases Thedifference between the two variants in the plot is that the second excludes somecases in which all of the pages of a large Web space link to some other Webspace (which usually corresponds to the company that designed the regrst Webspace) In this second variant there is a logical decrease in the tail for the datafrom which we eliminated the targets that were very frequently linked to byvery large Web spaces The behaviour was similar for the regrst set

To conclude with the power law representations Figure 5 shows the Webspaces as in the previous case again eliminating the targets that are veryfrequently linked to in very large Web spaces but now with the restriction of

ordfSitationordmdistributions

565

Figure 1Fraction of pages of theregrst set that receive agiven number of inlinksboth including andexcluding erroneousURLs compared with anideal power lawdistribution

JD595

566

Figure 2Fraction of pages of thesecond set that receive agiven number of inlinkscounting on the one handall the links and on the

other only the originpages for each link (in

both cases after havingeliminated the

erroneous links)

ordfSitationordmdistributions

567

Figure 3Fraction of pages of theregrst set that receive agiven number of inlinkscounting only the originWeb spaces of each linkboth including andexcluding the erroneousURLs compared with theideal power lawdistribution

JD595

568

Figure 4Fraction of Web spaces

of the second set thatreceive a given number

of inlinks both includingand excluding the group

of targets that werelinked to in a large

percentage of the pagesof some very large Web

spaces

ordfSitationordmdistributions

569

Figure 5Fraction of Web spacesthat receive a givennumber of inlinkscomparing the regrst setwith the secondincluding only targetsdealing withExtremadura andexcluding targets thatwere very frequentlylinked to in very largeWeb spaces

JD595

570

being Web spaces which deal with Extremadura We compare the data fromthe regrst set with the data from the second One observes the logical decrease inthe number of elements but also a decrease in the slope that regts best anexponent of 15

As we mentioned above Bradfordrsquos law distributions are traditionallyrepresented differently from power law distributions so the presentation ofFigures 6-10 will be slightly different from the previous regve with accumulatedsitations plotted against sitors using a logarithmic scale for the latter sincetheir number increases exponentially with the accumulated sitationsBradfordrsquos law is of course normally applied to scientiregc journals countingthe citations received by each journal These citations point from the citingarticles (which in turn belong to journals) towards the journalrsquos articles Acertain ambiguity arises in the Web analogue however When a certain work iscited several times in an article it is only counted once Hence in our case toowhen a certain target is linked to several times it should also only be countedonce The problem is though that it is not clear that the Web spaces and pagesof sitation analysis should be taken as the analogues of journals and articlesrespectively or rather that an entire Web space should be taken as theanalogue of an article We shall look at both possibilities

Figure 6 shows the results for the second set when we count only onesitation per Web space for each target We present the results for target pagesindependently after grouping the pages by Web space and after alsoeliminating the frequent links as we did in the power law case This thereforeregards the analogues of the citor articles to be Web spaces the analogues ofthe cited journals to be pages or entire Web spaces (depending on the case) Thedata do not regt Bradfordrsquos law since the curve is concave from above Hence thenumber of sitations appears to grow exponentially and indeed if a logarithmicscale is also used for the vertical axis the result is practically a straight lineLikewise the verbal statement of the law is not satisreged since the proportion1nn2 does not hold

Figure 7 shows the same three distributions for the regrst set but nowcounting sitor pages This means that we are now considering Web pages to bethe analogues of citor articles which seems more natural One sees that the regrstdistribution corresponding to pages lies in the upper part of the plot at regrstrunning coincident with the Web space distribution and then rising above it

The second distribution grouped by Web space includes points ofinmacrection but in synthesis differs from a Bradford law curve in the followingaspects it begins at a point clearly above the origin the linear portion does notbegin at the regrst point of inmacrection but later and it ends with a stronglyincreasing section

The third distribution occupies the lower part of the plot This means thateliminating the links found in a large percentage of the pages of very large Webspaces eliminates many sitations but not many Web spaces or linked pages

ordfSitationordmdistributions

571

Figure 6Accumulated sitations vsaccumulated Webspaces The data arefrom the second set Theresults are presented fortarget pagesindependently groupedby Web space andexcluding targets thatwere very frequentlylinked to in very largeWeb spaces Only sitorWeb spaces are counted

JD595

572

Figure 7Accumulated sitations vs

accumulated Webspaces The data are

from the regrst set Theresults are presented for

target pagesindependently grouped

by Web space andexcluding targets thatwere very frequently

linked to in very largeWeb spaces Sitor pages

are counted

ordfSitationordmdistributions

573

Figure 8Accumulated sitations vsaccumulated Webspaces The data arefrom the regrst set Theresults are presented fortarget pagesindependently groupedby Web space andexcluding targets thatwere very frequentlylinked to in very largeWeb spaces Sitor pagesare counted but onlytargets aboutExtremadura

JD595

574

Figure 9Accumulated sitations vs

accumulated Webspaces The data are

from the second set Theresults are presented for

target pagesindependently grouped

by Web space andexcluding targets thatwere very frequently

linked to in very largeWeb spaces Sitor pages

are counted

ordfSitationordmdistributions

575

Figure 10Accumulated sitations vsaccumulated Webspaces The data arefrom the second set Theresults are presented fortarget pagesindependently groupedby Web space andexcluding targets thatwere very frequentlylinked to in very largeWeb spaces Sitor pagesare counted but onlytargets aboutExtremadura

JD595

576

Although as this distribution is plotted together with the other distributionsits slope is less clearly perceptible one nevertheless observes that the slopeincreases continuously implying an exponential growth of the accumulatedsitations also Indeed if a logarithmic scale is also used on the vertical axis theresult is a straight line with no point of inmacrection and not regtting Bradfordrsquoslaw It is interesting that with the elimination of this set of links the dependenceof the sitations becomes exponential Again the verbal statement of the law isnot satisreged since the proportion 1nn2 does not hold

Some authors Brookes (1969) for instance consider that Bradfordrsquos law ismainly applicable in small and well-deregned areas of research in which theremust exist a strong thematic relationship between the documents In Figure 8we have therefore represented only the pages or Web spaces that are fromExtremadura (for the regrst set of data) The behaviour however is quite similarto the previous case the regrst two distributions are fairly similar to each otherwith the greatest difference being in the third distribution which is a straightline

Figures 9 and 10 show the same calculations as Figures 7 and 8 but now fordata from the second set The results are identical

ConclusionsThe present experiment focused on the distribution of citations in a closed butheterogeneous environment of thematically related Web spaces using two setsof data We studied how they regt on the one hand a power law distribution andon the other Bradfordrsquos law looking at a great many variants with a total of 384plots

Studies in the literature have shown that on a very large scale sitationsfollow a power law distribution with an exponent of 21 The present sitationdistributions from both data sets were coherent with those earlier studiesincluding in the exponent independently of whether or not the counts includedbroken links total links or only sitor pages and pages or Web spaces both asorigin and as target We showed that the distributions differed howeveraccording to which calculation procedure was used and to what reglter wasapplied to the data In particular when we considered only Web spaces asorigins instead of pages or links and pages as destinations the slope increasedto correspond approximately to an exponent of 28 When however bothdestinations and origins were Web spaces the exponent returned to the value21 Also when the destinations were thematically restricted (in which casethey practically coincided with the citors) the slope was reduced to correspondapproximately to an exponent of 15 Finally the tail in the data was reduced byeliminating the targets that were very frequently linked to in very large Webspaces

We studied the regt to Bradfordrsquos law by plotting the accumulated sitationsagainst the accumulated targets Although we presented many distributions

ordfSitationordmdistributions

577

none of them regtted the typical Bradford case When we considered Web spacesas the sitors the resulting distribution was concave from above with no pointof inmacrection and passing through the origin The other distributions in whichpages were considered as the sitors could in general be characterized as beingcurves which start noticeably above the origin and have several points ofinmacrection ( ) It was also notable regrstly that taking Web spaces asdestinations instead of pages greatly reduced the accumulated sitations andmuch of the complexity of the curve and secondly that eliminating the targetsthat were very frequently linked to in very large Web spaces sharply reducedthe accumulated sitations and the curves changed now to pass through theorigin and presented an exponential dependence on the accumulated sitations

In sum everything seems to support the observation of Kim (2000) that thereare different motivations behind the citations in scientiregc articles and the linksof the World Wide Web In this sense we conregrmed the immense inmacruence ofthematic restriction (for instance nearly all the Web spaces included linksrelated to the World Wide Web itself and its technology) and of the repetitivelinks found in most of the very large Web spaces (which are usually generatedby means of software tools that allow links to be automatically included onevery page)

References

Aguillo IF (2000) ordfIndicadores hacia una evaluacioAcircn no objetiva (cuantitativa) de sedes webordm inVII Jornadas EspanAuml olas de DocumentacioAcircn (Bilbao 19-21 de octubre de 2000) ServicioEditorial de la Universidad del PaotildeAcircs Vasco Bilbao pp 233-48

Almind TC and Ingwersen P (1997) ordfInformetric analyses on the World Wide Webmethodological approaches to `Webometricsrsquoordm Journal of Documentation Vol 53 No 4pp 404-26

Bar-Ilan J (1997) ordfThe `mad cow diseasersquo Usenet newsgroups and bibliometric lawsordmScientometrics Vol 39 No 1 pp 29-55

Bar-Ilan J (2001) ordfData collection methods on the Web for informetrics purposes a review andanalysisordm Scientometrics Vol 50 No 1 pp 7-32

BarabaAcircsi AL and Albert R (1999) ordfEmergence of scaling in random networksordm ScienceVol 286 pp 509-12

BjoEgraverneborn L and Ingwersen P (2001) ordfPerspectives of webometricsordm Scientometrics Vol 50No 1 pp 65-82

Bradford SC (1934) ordfSourcesof information of speciregc subjectsordm Engineering Vol 137 pp 85-6

Brin S and Page L (1998) ordfThe anatomy of a large-scale hypertextual web search engineordmComputer Networks and ISDN Systems Vol 30 pp 107-17

Broder A et al (2000) ordfGraph structure in the Webordm Computer Networks and ISDN SystemsVol 33 No 1-6 pp 309-20

BrookesBC (1969) ordfBradfordrsquos law and the bibliography of scienceordm Nature Vol 224 pp 953-6

Codina L (2000) ordfParaAcircmetros e indicadores de calidad para la evaluacioAcircn de recursos digitalesordmin VII Jornadas EspanAuml olas de DocumentacioAcircn (Bilbao 19-21 de octubre de 2000) ServicioEditorial de la Universidad del PaotildeAcircs Vasco Bilbao pp 135-44

JD595

578

Correa-Uribe G (1999) Colombia Conectada al Mundo Sitios Web Colombianos Universidad deAntioquotildeAcirca AntioquotildeAcirca

Cronin B (2001) ordfBibliometrics and beyond some thoughts on web-based citation analysisordmJournal of Information Science Vol 27 No 1 pp 1-7

Cui L (1999) ordfRating health web sites using the principles of citation analysis a bibliometricapproachordm Journal of Medical Internet Research Vol 1 No e4 avalaible atwwwjmirorg19991e4indexhtm (accessed 18 August 2001)

Cybermetrics International Journal of Scientometrics Informetrics and Bibliometrics (2002)available at wwwcindoccsicescybermetrics (accessed 11 August 2002)

Egghe L (2000) ordfNew informetric aspects of the Internet some remacrections plusmn many problemsordmJournal of Information Science Vol 26 No 5 pp 329-35

Ferreiro-AlaAcircez L (1981) ordfAnaAcirclisis de referencias y caracterotildeAcircsticas bibliomeAcirctricas de losconjuntos de revistas nuclearesordm Revista EspanAuml ola de DocumentacioAcircn CientotildeAcircregca Vol 4No 3 pp 181-98

Gorbea-Portal S (1996) El Modelo MatemaAcirctico de Bradford Su AplicacioAcircn a las RevistasLatinoamericanas de las Ciencias BibliotecoloAcircgicas y de la InformacioAcircn UNAM CentroUniversitario de Investigaciones BibliotecoloAcircgicas MeAcircxico

Gupta DK (1991) ordfApplication of Bradfordrsquos law to citation data of Ethiopian medicaljournalsordm Annals of Library Science and Documentation Vol 38 No 3 pp 85-98

Harter SP and Ford CE (2000) ordfWeb-based analyses of e-journal impact approachesproblems and issuesordm Library Science With a Slant to Documentation and InformationStudies Vol 51 No 13 pp 1159-76

Houston W (1983) ordfThe application of bibliometrics to veterinary science primary literatureordmQuarterly Bulletin of International Association of Agricultural Information SpecialistsVol 28 No 1 pp 6-13

Internet Software Consortium (2002) ordfInternet domain surveyordm available at wwwiscorg(accessed 4 November 2002)

JimeAcircnez-Piano M (2001) ordfEvaluacioAcircn de sedes webordm Revista EspanAuml ola de DocumentacioAcircnCientotildeAcircregca Vol 24 No 4 pp 405-32

Kim HJ (2000) ordfMotivations for hyperlinking in scholarly electronic articles a qualitativestudyordm Journal of the American Society for Information Science Vol 51 No 10 pp 887-99

Kumar R et al (2001) ordfTrawling the Web for emerging cyber-communitiesordm in Mendelzon A etal (Eds) Proceedings of the 8th International World Wide Web Conference (TorontoCanadaAcirc May 11-14 1999) available at www8orgw8-papers4a-search-miningtrawlingtrawlinghtml (accessed 2 October 2002) Elsevier Amsterdam

Lal A and Panda KC (1999) ordfBradfordrsquos law and its application to bibliographical data ofplant pathology dissertations an analytical approachordm Library Science With a Slant toDocumentation and Information Studies Vol 36 No 3 pp 193-206

Larson RR (1996) ordfBibliometrics of the World Wide Web and exploratory analysis of theintellectual structure of cyberspaceordm in Hardin S (Ed) Proceedings of the 59th AnnualMeeting of the American Society for Information Science (Baltimore Maryland 1996)Information Today Medford NJ pp 71-8 available at httpsherlockberkeleyeduasis96asis96html (accessed 14 October 2000)

McKiernan G (1996) ordfCitedSites(sm) citation indexing of Web resourcesordm available atwwwpubliciastateedu CYBERSTACKSCitedhtm (accessed 24 February 2000)

Mubeen MA (1996) ordfCitation analysis of doctoral dissertations in chemistryordm Annals of LibraryScience and Documentation Vol 43 No 2 pp 48-58

ordfSitationordmdistributions

579

Price DJ (1970) ordfCitation measures of hard science soft science technology and non-scienceordm inNelson CC and Pollock DE (Eds) Communication Among Scientists and Engineers DCHealth and Co Lexington MA pp 3-22

Reyes-BarragaAcircn MJ et al (2000) ordfRevistas cientotildeAcircregcas determinacioAcircn de necesidades y usosordmRevista EspanAuml ola de DocumentacioAcircn CientotildeAcircregca Vol 23 No 4 pp 417-36

RodrotildeAcircguez i GairotildeAcircn JM (1997) ordfValoracioAcircn del impacto de la informacioAcircn en Internet Altavistael `Citation Indexrsquo de la redordm Revista EspanAuml ola de DocumentacioAcircn CientotildeAcircregca Vol 20 No 2pp 175-81

Rousseau R (1997) ordfCitations an exploratory studyordm Cybermetrics International Journal ofScientometrics Informetrics and Bibliometrics Vol 1 p 1 available atwwwcindoccsicescybermetricsarticlesv1i1p1html (accessed 5 September 2000)

Smith AG (1996) ordfCriteria for evaluation of Internet information resourcesordm available atwwwvuwacnz agsmithevalnindexhtm (accessed 25 March 2000)

Smith AG (1997) ordfTesting the surf criteria for evaluating Internet information resourcesordm ThePublic-Access Computer Systems Review Vol 8 No 3 available at httpinfolibuheduprv8n3smit8n3html (accessed 28 February 2002)

Smith AG (1999) ordfThe impact of web sites a comparison betwen Australasia and LatinAmericaordm available at wwwvuwacnz agsmithpublnsaustlat (accessed 12 May2001)

Smith AG (2001) ordfApplying evaluation criteria to New Zealand government websitesordmInternational Journal of Information Management Vol 21 No 2 pp 137-49

Tague-Sutcliffe J (1992) ordfAn introduction to informetricsordm Information Processing andManagement Vol 28 No 1 pp 1-3

Tillman HN (2000) ordfEvaluating quality on the Netordm available atwwwtiacnetusershoperegndqualhtml (accessed 5 June 2000)

van Raan AFJ (2001) ordfBibliometrics and Internet some observations and expectationsordmScientometrics Vol 50 No 1 pp 59-63

Vreeland RC (2000) ordfLaw libraries in hyperspacea citation analysis of World Wide Web sitesordmLaw Library Journal Vol 92 No 1 pp 9-25

Zhang P and von Dran GM (2000) ordfSatisregers and dissatisregers a two-factor model for websitedesign and evaluationordm Journal of the American Society for Information Science Vol 51No 14 pp 1253-68

JD595

580

(2) Type of link We again formed two outlink subsets Only those of the type ordfA Area or Frameordm These mainly denote some

thematic relationship Links of other types such as bgsound embediframe img input meta script etc were thereby eliminated

All the links independently of type

(3) Origin This indicator allowed three variations in the form of computingthe number of sitations to a given target Web spaces that were linked to a given target Web pages that were linked to a given target links that had a given target

(4) Destination There were two variations all the targets target Web spaces locating them by grouping together the type of

link URLs according to whether the path was included in some otherURL or not

(5) ExtremenAuml an target Again there were two variations in the form ofcomputing the targets all the targets only those that corresponded to Extremadura

(6) Targets with high sitation percentages from the pages of some Webspaces Another two variations in the form of computing the links all the links only those that did not have a high percentage of their sitations

originating from the pages of certain very large Web spaces (therewere some sitor Web spaces with a great number of pages a highproportion of which had a link to the same target thereby enormouslyincreasing the number of sitations of that target)

These variations generated 192 different distributions to use in thecalculations Since the power-law based distributions are traditionallyrepresented differently (number of pages vs the logarithm of the sitationsreceived) from those based on Bradfordrsquos law (accumulated sitations receivedvs the logarithm of accumulated targets) this would mean a total of 384representations We hence considered that to present exhaustively all the workthat was performed would surpass the limits of a communication such as thepresent Furthermore since it was possible to extract common characteristicsfrom many of these distributions we shall just present a small butrepresentative sample

We shall regrstly consider how well the data regt the power law by showing aseries of plots of the general behaviour with respect to a power law distribution

JD595

564

Since this type of distribution has been demonstrated to hold for Web pageswe shall begin with them Figure 1 shows the general regt to an exponent of 21We represent the number of pages that receive a given number of links using alogarithmic scale following Broder et al (2000) although instead ofrepresenting the raw number of pages we use the fraction of pages Theplotted data correspond to the regrst set both including and excluding those thathave unrecoverable errors (broken links) One observes the general goodness ofthe regt to the exponent 21 with there being a tail similar to that obtained byBroder et al (2000) (corresponding to groups of URLs that are cited veryfrequently) although we do not regnd those workersrsquo slight initial fall in the dataA close observation of Figure 1 shows that when there are few inlinks thepoints corresponding to all the links are for ratios slightly above those for thepoints corresponding to unbroken links only This tendency is reversed as thenumber of inlinks rises This was to be expected since it indicates that brokenlinks (the only difference between the pairs of points) are cited less often Thereare no other apparent differences between the data with and without errorsThe behaviour was similar for the second set

In Figure 2 we use the second set plotting all the links and the origin pagesseparately The difference is that in the second case if there are several linkswith the same target in the same page then they only count as one This will bethe case henceforth unless explicitly stated to the contrary One sees that theredo not appear to be any major differences between the two sets of points exceptfor the effect of small reductions in the number of inlinks recorded in somecases In particular the two variants are similar in shape The behaviour wassimilar for the regrst set

In Figure 3 we have grouped the origins by Web space so that several linkswithin the same space that have the same target are only counted once Oneresult is that the number of links recorded for each destination is far less than inFigures 1 and 2 The data are from the regrst set The tail is smaller and the slopeis greater corresponding to an exponent of 28 but the regt to a power lawdistribution is just as good The behaviour was similar for the second set

In Figure 4 we have grouped the targets just as in Figure 3 we grouped theorigins according to the path of the different URLs The data are from thesecond set The regt to the ideal distribution is similar to the previous cases Thedifference between the two variants in the plot is that the second excludes somecases in which all of the pages of a large Web space link to some other Webspace (which usually corresponds to the company that designed the regrst Webspace) In this second variant there is a logical decrease in the tail for the datafrom which we eliminated the targets that were very frequently linked to byvery large Web spaces The behaviour was similar for the regrst set

To conclude with the power law representations Figure 5 shows the Webspaces as in the previous case again eliminating the targets that are veryfrequently linked to in very large Web spaces but now with the restriction of

ordfSitationordmdistributions

565

Figure 1Fraction of pages of theregrst set that receive agiven number of inlinksboth including andexcluding erroneousURLs compared with anideal power lawdistribution

JD595

566

Figure 2Fraction of pages of thesecond set that receive agiven number of inlinkscounting on the one handall the links and on the

other only the originpages for each link (in

both cases after havingeliminated the

erroneous links)

ordfSitationordmdistributions

567

Figure 3Fraction of pages of theregrst set that receive agiven number of inlinkscounting only the originWeb spaces of each linkboth including andexcluding the erroneousURLs compared with theideal power lawdistribution

JD595

568

Figure 4Fraction of Web spaces

of the second set thatreceive a given number

of inlinks both includingand excluding the group

of targets that werelinked to in a large

percentage of the pagesof some very large Web

spaces

ordfSitationordmdistributions

569

Figure 5Fraction of Web spacesthat receive a givennumber of inlinkscomparing the regrst setwith the secondincluding only targetsdealing withExtremadura andexcluding targets thatwere very frequentlylinked to in very largeWeb spaces

JD595

570

being Web spaces which deal with Extremadura We compare the data fromthe regrst set with the data from the second One observes the logical decrease inthe number of elements but also a decrease in the slope that regts best anexponent of 15

As we mentioned above Bradfordrsquos law distributions are traditionallyrepresented differently from power law distributions so the presentation ofFigures 6-10 will be slightly different from the previous regve with accumulatedsitations plotted against sitors using a logarithmic scale for the latter sincetheir number increases exponentially with the accumulated sitationsBradfordrsquos law is of course normally applied to scientiregc journals countingthe citations received by each journal These citations point from the citingarticles (which in turn belong to journals) towards the journalrsquos articles Acertain ambiguity arises in the Web analogue however When a certain work iscited several times in an article it is only counted once Hence in our case toowhen a certain target is linked to several times it should also only be countedonce The problem is though that it is not clear that the Web spaces and pagesof sitation analysis should be taken as the analogues of journals and articlesrespectively or rather that an entire Web space should be taken as theanalogue of an article We shall look at both possibilities

Figure 6 shows the results for the second set when we count only onesitation per Web space for each target We present the results for target pagesindependently after grouping the pages by Web space and after alsoeliminating the frequent links as we did in the power law case This thereforeregards the analogues of the citor articles to be Web spaces the analogues ofthe cited journals to be pages or entire Web spaces (depending on the case) Thedata do not regt Bradfordrsquos law since the curve is concave from above Hence thenumber of sitations appears to grow exponentially and indeed if a logarithmicscale is also used for the vertical axis the result is practically a straight lineLikewise the verbal statement of the law is not satisreged since the proportion1nn2 does not hold

Figure 7 shows the same three distributions for the regrst set but nowcounting sitor pages This means that we are now considering Web pages to bethe analogues of citor articles which seems more natural One sees that the regrstdistribution corresponding to pages lies in the upper part of the plot at regrstrunning coincident with the Web space distribution and then rising above it

The second distribution grouped by Web space includes points ofinmacrection but in synthesis differs from a Bradford law curve in the followingaspects it begins at a point clearly above the origin the linear portion does notbegin at the regrst point of inmacrection but later and it ends with a stronglyincreasing section

The third distribution occupies the lower part of the plot This means thateliminating the links found in a large percentage of the pages of very large Webspaces eliminates many sitations but not many Web spaces or linked pages

ordfSitationordmdistributions

571

Figure 6Accumulated sitations vsaccumulated Webspaces The data arefrom the second set Theresults are presented fortarget pagesindependently groupedby Web space andexcluding targets thatwere very frequentlylinked to in very largeWeb spaces Only sitorWeb spaces are counted

JD595

572

Figure 7Accumulated sitations vs

accumulated Webspaces The data are

from the regrst set Theresults are presented for

target pagesindependently grouped

by Web space andexcluding targets thatwere very frequently

linked to in very largeWeb spaces Sitor pages

are counted

ordfSitationordmdistributions

573

Figure 8Accumulated sitations vsaccumulated Webspaces The data arefrom the regrst set Theresults are presented fortarget pagesindependently groupedby Web space andexcluding targets thatwere very frequentlylinked to in very largeWeb spaces Sitor pagesare counted but onlytargets aboutExtremadura

JD595

574

Figure 9Accumulated sitations vs

accumulated Webspaces The data are

from the second set Theresults are presented for

target pagesindependently grouped

by Web space andexcluding targets thatwere very frequently

linked to in very largeWeb spaces Sitor pages

are counted

ordfSitationordmdistributions

575

Figure 10Accumulated sitations vsaccumulated Webspaces The data arefrom the second set Theresults are presented fortarget pagesindependently groupedby Web space andexcluding targets thatwere very frequentlylinked to in very largeWeb spaces Sitor pagesare counted but onlytargets aboutExtremadura

JD595

576

Although as this distribution is plotted together with the other distributionsits slope is less clearly perceptible one nevertheless observes that the slopeincreases continuously implying an exponential growth of the accumulatedsitations also Indeed if a logarithmic scale is also used on the vertical axis theresult is a straight line with no point of inmacrection and not regtting Bradfordrsquoslaw It is interesting that with the elimination of this set of links the dependenceof the sitations becomes exponential Again the verbal statement of the law isnot satisreged since the proportion 1nn2 does not hold

Some authors Brookes (1969) for instance consider that Bradfordrsquos law ismainly applicable in small and well-deregned areas of research in which theremust exist a strong thematic relationship between the documents In Figure 8we have therefore represented only the pages or Web spaces that are fromExtremadura (for the regrst set of data) The behaviour however is quite similarto the previous case the regrst two distributions are fairly similar to each otherwith the greatest difference being in the third distribution which is a straightline

Figures 9 and 10 show the same calculations as Figures 7 and 8 but now fordata from the second set The results are identical

ConclusionsThe present experiment focused on the distribution of citations in a closed butheterogeneous environment of thematically related Web spaces using two setsof data We studied how they regt on the one hand a power law distribution andon the other Bradfordrsquos law looking at a great many variants with a total of 384plots

Studies in the literature have shown that on a very large scale sitationsfollow a power law distribution with an exponent of 21 The present sitationdistributions from both data sets were coherent with those earlier studiesincluding in the exponent independently of whether or not the counts includedbroken links total links or only sitor pages and pages or Web spaces both asorigin and as target We showed that the distributions differed howeveraccording to which calculation procedure was used and to what reglter wasapplied to the data In particular when we considered only Web spaces asorigins instead of pages or links and pages as destinations the slope increasedto correspond approximately to an exponent of 28 When however bothdestinations and origins were Web spaces the exponent returned to the value21 Also when the destinations were thematically restricted (in which casethey practically coincided with the citors) the slope was reduced to correspondapproximately to an exponent of 15 Finally the tail in the data was reduced byeliminating the targets that were very frequently linked to in very large Webspaces

We studied the regt to Bradfordrsquos law by plotting the accumulated sitationsagainst the accumulated targets Although we presented many distributions

ordfSitationordmdistributions

577

none of them regtted the typical Bradford case When we considered Web spacesas the sitors the resulting distribution was concave from above with no pointof inmacrection and passing through the origin The other distributions in whichpages were considered as the sitors could in general be characterized as beingcurves which start noticeably above the origin and have several points ofinmacrection ( ) It was also notable regrstly that taking Web spaces asdestinations instead of pages greatly reduced the accumulated sitations andmuch of the complexity of the curve and secondly that eliminating the targetsthat were very frequently linked to in very large Web spaces sharply reducedthe accumulated sitations and the curves changed now to pass through theorigin and presented an exponential dependence on the accumulated sitations

In sum everything seems to support the observation of Kim (2000) that thereare different motivations behind the citations in scientiregc articles and the linksof the World Wide Web In this sense we conregrmed the immense inmacruence ofthematic restriction (for instance nearly all the Web spaces included linksrelated to the World Wide Web itself and its technology) and of the repetitivelinks found in most of the very large Web spaces (which are usually generatedby means of software tools that allow links to be automatically included onevery page)

References

Aguillo IF (2000) ordfIndicadores hacia una evaluacioAcircn no objetiva (cuantitativa) de sedes webordm inVII Jornadas EspanAuml olas de DocumentacioAcircn (Bilbao 19-21 de octubre de 2000) ServicioEditorial de la Universidad del PaotildeAcircs Vasco Bilbao pp 233-48

Almind TC and Ingwersen P (1997) ordfInformetric analyses on the World Wide Webmethodological approaches to `Webometricsrsquoordm Journal of Documentation Vol 53 No 4pp 404-26

Bar-Ilan J (1997) ordfThe `mad cow diseasersquo Usenet newsgroups and bibliometric lawsordmScientometrics Vol 39 No 1 pp 29-55

Bar-Ilan J (2001) ordfData collection methods on the Web for informetrics purposes a review andanalysisordm Scientometrics Vol 50 No 1 pp 7-32

BarabaAcircsi AL and Albert R (1999) ordfEmergence of scaling in random networksordm ScienceVol 286 pp 509-12

BjoEgraverneborn L and Ingwersen P (2001) ordfPerspectives of webometricsordm Scientometrics Vol 50No 1 pp 65-82

Bradford SC (1934) ordfSourcesof information of speciregc subjectsordm Engineering Vol 137 pp 85-6

Brin S and Page L (1998) ordfThe anatomy of a large-scale hypertextual web search engineordmComputer Networks and ISDN Systems Vol 30 pp 107-17

Broder A et al (2000) ordfGraph structure in the Webordm Computer Networks and ISDN SystemsVol 33 No 1-6 pp 309-20

BrookesBC (1969) ordfBradfordrsquos law and the bibliography of scienceordm Nature Vol 224 pp 953-6

Codina L (2000) ordfParaAcircmetros e indicadores de calidad para la evaluacioAcircn de recursos digitalesordmin VII Jornadas EspanAuml olas de DocumentacioAcircn (Bilbao 19-21 de octubre de 2000) ServicioEditorial de la Universidad del PaotildeAcircs Vasco Bilbao pp 135-44

JD595

578

Correa-Uribe G (1999) Colombia Conectada al Mundo Sitios Web Colombianos Universidad deAntioquotildeAcirca AntioquotildeAcirca

Cronin B (2001) ordfBibliometrics and beyond some thoughts on web-based citation analysisordmJournal of Information Science Vol 27 No 1 pp 1-7

Cui L (1999) ordfRating health web sites using the principles of citation analysis a bibliometricapproachordm Journal of Medical Internet Research Vol 1 No e4 avalaible atwwwjmirorg19991e4indexhtm (accessed 18 August 2001)

Cybermetrics International Journal of Scientometrics Informetrics and Bibliometrics (2002)available at wwwcindoccsicescybermetrics (accessed 11 August 2002)

Egghe L (2000) ordfNew informetric aspects of the Internet some remacrections plusmn many problemsordmJournal of Information Science Vol 26 No 5 pp 329-35

Ferreiro-AlaAcircez L (1981) ordfAnaAcirclisis de referencias y caracterotildeAcircsticas bibliomeAcirctricas de losconjuntos de revistas nuclearesordm Revista EspanAuml ola de DocumentacioAcircn CientotildeAcircregca Vol 4No 3 pp 181-98

Gorbea-Portal S (1996) El Modelo MatemaAcirctico de Bradford Su AplicacioAcircn a las RevistasLatinoamericanas de las Ciencias BibliotecoloAcircgicas y de la InformacioAcircn UNAM CentroUniversitario de Investigaciones BibliotecoloAcircgicas MeAcircxico

Gupta DK (1991) ordfApplication of Bradfordrsquos law to citation data of Ethiopian medicaljournalsordm Annals of Library Science and Documentation Vol 38 No 3 pp 85-98

Harter SP and Ford CE (2000) ordfWeb-based analyses of e-journal impact approachesproblems and issuesordm Library Science With a Slant to Documentation and InformationStudies Vol 51 No 13 pp 1159-76

Houston W (1983) ordfThe application of bibliometrics to veterinary science primary literatureordmQuarterly Bulletin of International Association of Agricultural Information SpecialistsVol 28 No 1 pp 6-13

Internet Software Consortium (2002) ordfInternet domain surveyordm available at wwwiscorg(accessed 4 November 2002)

JimeAcircnez-Piano M (2001) ordfEvaluacioAcircn de sedes webordm Revista EspanAuml ola de DocumentacioAcircnCientotildeAcircregca Vol 24 No 4 pp 405-32

Kim HJ (2000) ordfMotivations for hyperlinking in scholarly electronic articles a qualitativestudyordm Journal of the American Society for Information Science Vol 51 No 10 pp 887-99

Kumar R et al (2001) ordfTrawling the Web for emerging cyber-communitiesordm in Mendelzon A etal (Eds) Proceedings of the 8th International World Wide Web Conference (TorontoCanadaAcirc May 11-14 1999) available at www8orgw8-papers4a-search-miningtrawlingtrawlinghtml (accessed 2 October 2002) Elsevier Amsterdam

Lal A and Panda KC (1999) ordfBradfordrsquos law and its application to bibliographical data ofplant pathology dissertations an analytical approachordm Library Science With a Slant toDocumentation and Information Studies Vol 36 No 3 pp 193-206

Larson RR (1996) ordfBibliometrics of the World Wide Web and exploratory analysis of theintellectual structure of cyberspaceordm in Hardin S (Ed) Proceedings of the 59th AnnualMeeting of the American Society for Information Science (Baltimore Maryland 1996)Information Today Medford NJ pp 71-8 available at httpsherlockberkeleyeduasis96asis96html (accessed 14 October 2000)

McKiernan G (1996) ordfCitedSites(sm) citation indexing of Web resourcesordm available atwwwpubliciastateedu CYBERSTACKSCitedhtm (accessed 24 February 2000)

Mubeen MA (1996) ordfCitation analysis of doctoral dissertations in chemistryordm Annals of LibraryScience and Documentation Vol 43 No 2 pp 48-58

ordfSitationordmdistributions

579

Price DJ (1970) ordfCitation measures of hard science soft science technology and non-scienceordm inNelson CC and Pollock DE (Eds) Communication Among Scientists and Engineers DCHealth and Co Lexington MA pp 3-22

Reyes-BarragaAcircn MJ et al (2000) ordfRevistas cientotildeAcircregcas determinacioAcircn de necesidades y usosordmRevista EspanAuml ola de DocumentacioAcircn CientotildeAcircregca Vol 23 No 4 pp 417-36

RodrotildeAcircguez i GairotildeAcircn JM (1997) ordfValoracioAcircn del impacto de la informacioAcircn en Internet Altavistael `Citation Indexrsquo de la redordm Revista EspanAuml ola de DocumentacioAcircn CientotildeAcircregca Vol 20 No 2pp 175-81

Rousseau R (1997) ordfCitations an exploratory studyordm Cybermetrics International Journal ofScientometrics Informetrics and Bibliometrics Vol 1 p 1 available atwwwcindoccsicescybermetricsarticlesv1i1p1html (accessed 5 September 2000)

Smith AG (1996) ordfCriteria for evaluation of Internet information resourcesordm available atwwwvuwacnz agsmithevalnindexhtm (accessed 25 March 2000)

Smith AG (1997) ordfTesting the surf criteria for evaluating Internet information resourcesordm ThePublic-Access Computer Systems Review Vol 8 No 3 available at httpinfolibuheduprv8n3smit8n3html (accessed 28 February 2002)

Smith AG (1999) ordfThe impact of web sites a comparison betwen Australasia and LatinAmericaordm available at wwwvuwacnz agsmithpublnsaustlat (accessed 12 May2001)

Smith AG (2001) ordfApplying evaluation criteria to New Zealand government websitesordmInternational Journal of Information Management Vol 21 No 2 pp 137-49

Tague-Sutcliffe J (1992) ordfAn introduction to informetricsordm Information Processing andManagement Vol 28 No 1 pp 1-3

Tillman HN (2000) ordfEvaluating quality on the Netordm available atwwwtiacnetusershoperegndqualhtml (accessed 5 June 2000)

van Raan AFJ (2001) ordfBibliometrics and Internet some observations and expectationsordmScientometrics Vol 50 No 1 pp 59-63

Vreeland RC (2000) ordfLaw libraries in hyperspacea citation analysis of World Wide Web sitesordmLaw Library Journal Vol 92 No 1 pp 9-25

Zhang P and von Dran GM (2000) ordfSatisregers and dissatisregers a two-factor model for websitedesign and evaluationordm Journal of the American Society for Information Science Vol 51No 14 pp 1253-68

JD595

580

Since this type of distribution has been demonstrated to hold for Web pageswe shall begin with them Figure 1 shows the general regt to an exponent of 21We represent the number of pages that receive a given number of links using alogarithmic scale following Broder et al (2000) although instead ofrepresenting the raw number of pages we use the fraction of pages Theplotted data correspond to the regrst set both including and excluding those thathave unrecoverable errors (broken links) One observes the general goodness ofthe regt to the exponent 21 with there being a tail similar to that obtained byBroder et al (2000) (corresponding to groups of URLs that are cited veryfrequently) although we do not regnd those workersrsquo slight initial fall in the dataA close observation of Figure 1 shows that when there are few inlinks thepoints corresponding to all the links are for ratios slightly above those for thepoints corresponding to unbroken links only This tendency is reversed as thenumber of inlinks rises This was to be expected since it indicates that brokenlinks (the only difference between the pairs of points) are cited less often Thereare no other apparent differences between the data with and without errorsThe behaviour was similar for the second set

In Figure 2 we use the second set plotting all the links and the origin pagesseparately The difference is that in the second case if there are several linkswith the same target in the same page then they only count as one This will bethe case henceforth unless explicitly stated to the contrary One sees that theredo not appear to be any major differences between the two sets of points exceptfor the effect of small reductions in the number of inlinks recorded in somecases In particular the two variants are similar in shape The behaviour wassimilar for the regrst set

In Figure 3 we have grouped the origins by Web space so that several linkswithin the same space that have the same target are only counted once Oneresult is that the number of links recorded for each destination is far less than inFigures 1 and 2 The data are from the regrst set The tail is smaller and the slopeis greater corresponding to an exponent of 28 but the regt to a power lawdistribution is just as good The behaviour was similar for the second set

In Figure 4 we have grouped the targets just as in Figure 3 we grouped theorigins according to the path of the different URLs The data are from thesecond set The regt to the ideal distribution is similar to the previous cases Thedifference between the two variants in the plot is that the second excludes somecases in which all of the pages of a large Web space link to some other Webspace (which usually corresponds to the company that designed the regrst Webspace) In this second variant there is a logical decrease in the tail for the datafrom which we eliminated the targets that were very frequently linked to byvery large Web spaces The behaviour was similar for the regrst set

To conclude with the power law representations Figure 5 shows the Webspaces as in the previous case again eliminating the targets that are veryfrequently linked to in very large Web spaces but now with the restriction of

ordfSitationordmdistributions

565

Figure 1Fraction of pages of theregrst set that receive agiven number of inlinksboth including andexcluding erroneousURLs compared with anideal power lawdistribution

JD595

566

Figure 2Fraction of pages of thesecond set that receive agiven number of inlinkscounting on the one handall the links and on the

other only the originpages for each link (in

both cases after havingeliminated the

erroneous links)

ordfSitationordmdistributions

567

Figure 3Fraction of pages of theregrst set that receive agiven number of inlinkscounting only the originWeb spaces of each linkboth including andexcluding the erroneousURLs compared with theideal power lawdistribution

JD595

568

Figure 4Fraction of Web spaces

of the second set thatreceive a given number

of inlinks both includingand excluding the group

of targets that werelinked to in a large

percentage of the pagesof some very large Web

spaces

ordfSitationordmdistributions

569

Figure 5Fraction of Web spacesthat receive a givennumber of inlinkscomparing the regrst setwith the secondincluding only targetsdealing withExtremadura andexcluding targets thatwere very frequentlylinked to in very largeWeb spaces

JD595

570

being Web spaces which deal with Extremadura We compare the data fromthe regrst set with the data from the second One observes the logical decrease inthe number of elements but also a decrease in the slope that regts best anexponent of 15

As we mentioned above Bradfordrsquos law distributions are traditionallyrepresented differently from power law distributions so the presentation ofFigures 6-10 will be slightly different from the previous regve with accumulatedsitations plotted against sitors using a logarithmic scale for the latter sincetheir number increases exponentially with the accumulated sitationsBradfordrsquos law is of course normally applied to scientiregc journals countingthe citations received by each journal These citations point from the citingarticles (which in turn belong to journals) towards the journalrsquos articles Acertain ambiguity arises in the Web analogue however When a certain work iscited several times in an article it is only counted once Hence in our case toowhen a certain target is linked to several times it should also only be countedonce The problem is though that it is not clear that the Web spaces and pagesof sitation analysis should be taken as the analogues of journals and articlesrespectively or rather that an entire Web space should be taken as theanalogue of an article We shall look at both possibilities

Figure 6 shows the results for the second set when we count only onesitation per Web space for each target We present the results for target pagesindependently after grouping the pages by Web space and after alsoeliminating the frequent links as we did in the power law case This thereforeregards the analogues of the citor articles to be Web spaces the analogues ofthe cited journals to be pages or entire Web spaces (depending on the case) Thedata do not regt Bradfordrsquos law since the curve is concave from above Hence thenumber of sitations appears to grow exponentially and indeed if a logarithmicscale is also used for the vertical axis the result is practically a straight lineLikewise the verbal statement of the law is not satisreged since the proportion1nn2 does not hold

Figure 7 shows the same three distributions for the regrst set but nowcounting sitor pages This means that we are now considering Web pages to bethe analogues of citor articles which seems more natural One sees that the regrstdistribution corresponding to pages lies in the upper part of the plot at regrstrunning coincident with the Web space distribution and then rising above it

The second distribution grouped by Web space includes points ofinmacrection but in synthesis differs from a Bradford law curve in the followingaspects it begins at a point clearly above the origin the linear portion does notbegin at the regrst point of inmacrection but later and it ends with a stronglyincreasing section

The third distribution occupies the lower part of the plot This means thateliminating the links found in a large percentage of the pages of very large Webspaces eliminates many sitations but not many Web spaces or linked pages

ordfSitationordmdistributions

571

Figure 6Accumulated sitations vsaccumulated Webspaces The data arefrom the second set Theresults are presented fortarget pagesindependently groupedby Web space andexcluding targets thatwere very frequentlylinked to in very largeWeb spaces Only sitorWeb spaces are counted

JD595

572

Figure 7Accumulated sitations vs

accumulated Webspaces The data are

from the regrst set Theresults are presented for

target pagesindependently grouped

by Web space andexcluding targets thatwere very frequently

linked to in very largeWeb spaces Sitor pages

are counted

ordfSitationordmdistributions

573

Figure 8Accumulated sitations vsaccumulated Webspaces The data arefrom the regrst set Theresults are presented fortarget pagesindependently groupedby Web space andexcluding targets thatwere very frequentlylinked to in very largeWeb spaces Sitor pagesare counted but onlytargets aboutExtremadura

JD595

574

Figure 9Accumulated sitations vs

accumulated Webspaces The data are

from the second set Theresults are presented for

target pagesindependently grouped

by Web space andexcluding targets thatwere very frequently

linked to in very largeWeb spaces Sitor pages

are counted

ordfSitationordmdistributions

575

Figure 10Accumulated sitations vsaccumulated Webspaces The data arefrom the second set Theresults are presented fortarget pagesindependently groupedby Web space andexcluding targets thatwere very frequentlylinked to in very largeWeb spaces Sitor pagesare counted but onlytargets aboutExtremadura

JD595

576

Although as this distribution is plotted together with the other distributionsits slope is less clearly perceptible one nevertheless observes that the slopeincreases continuously implying an exponential growth of the accumulatedsitations also Indeed if a logarithmic scale is also used on the vertical axis theresult is a straight line with no point of inmacrection and not regtting Bradfordrsquoslaw It is interesting that with the elimination of this set of links the dependenceof the sitations becomes exponential Again the verbal statement of the law isnot satisreged since the proportion 1nn2 does not hold

Some authors Brookes (1969) for instance consider that Bradfordrsquos law ismainly applicable in small and well-deregned areas of research in which theremust exist a strong thematic relationship between the documents In Figure 8we have therefore represented only the pages or Web spaces that are fromExtremadura (for the regrst set of data) The behaviour however is quite similarto the previous case the regrst two distributions are fairly similar to each otherwith the greatest difference being in the third distribution which is a straightline

Figures 9 and 10 show the same calculations as Figures 7 and 8 but now fordata from the second set The results are identical

ConclusionsThe present experiment focused on the distribution of citations in a closed butheterogeneous environment of thematically related Web spaces using two setsof data We studied how they regt on the one hand a power law distribution andon the other Bradfordrsquos law looking at a great many variants with a total of 384plots

Studies in the literature have shown that on a very large scale sitationsfollow a power law distribution with an exponent of 21 The present sitationdistributions from both data sets were coherent with those earlier studiesincluding in the exponent independently of whether or not the counts includedbroken links total links or only sitor pages and pages or Web spaces both asorigin and as target We showed that the distributions differed howeveraccording to which calculation procedure was used and to what reglter wasapplied to the data In particular when we considered only Web spaces asorigins instead of pages or links and pages as destinations the slope increasedto correspond approximately to an exponent of 28 When however bothdestinations and origins were Web spaces the exponent returned to the value21 Also when the destinations were thematically restricted (in which casethey practically coincided with the citors) the slope was reduced to correspondapproximately to an exponent of 15 Finally the tail in the data was reduced byeliminating the targets that were very frequently linked to in very large Webspaces

We studied the regt to Bradfordrsquos law by plotting the accumulated sitationsagainst the accumulated targets Although we presented many distributions

ordfSitationordmdistributions

577

none of them regtted the typical Bradford case When we considered Web spacesas the sitors the resulting distribution was concave from above with no pointof inmacrection and passing through the origin The other distributions in whichpages were considered as the sitors could in general be characterized as beingcurves which start noticeably above the origin and have several points ofinmacrection ( ) It was also notable regrstly that taking Web spaces asdestinations instead of pages greatly reduced the accumulated sitations andmuch of the complexity of the curve and secondly that eliminating the targetsthat were very frequently linked to in very large Web spaces sharply reducedthe accumulated sitations and the curves changed now to pass through theorigin and presented an exponential dependence on the accumulated sitations

In sum everything seems to support the observation of Kim (2000) that thereare different motivations behind the citations in scientiregc articles and the linksof the World Wide Web In this sense we conregrmed the immense inmacruence ofthematic restriction (for instance nearly all the Web spaces included linksrelated to the World Wide Web itself and its technology) and of the repetitivelinks found in most of the very large Web spaces (which are usually generatedby means of software tools that allow links to be automatically included onevery page)

References

Aguillo IF (2000) ordfIndicadores hacia una evaluacioAcircn no objetiva (cuantitativa) de sedes webordm inVII Jornadas EspanAuml olas de DocumentacioAcircn (Bilbao 19-21 de octubre de 2000) ServicioEditorial de la Universidad del PaotildeAcircs Vasco Bilbao pp 233-48

Almind TC and Ingwersen P (1997) ordfInformetric analyses on the World Wide Webmethodological approaches to `Webometricsrsquoordm Journal of Documentation Vol 53 No 4pp 404-26

Bar-Ilan J (1997) ordfThe `mad cow diseasersquo Usenet newsgroups and bibliometric lawsordmScientometrics Vol 39 No 1 pp 29-55

Bar-Ilan J (2001) ordfData collection methods on the Web for informetrics purposes a review andanalysisordm Scientometrics Vol 50 No 1 pp 7-32

BarabaAcircsi AL and Albert R (1999) ordfEmergence of scaling in random networksordm ScienceVol 286 pp 509-12

BjoEgraverneborn L and Ingwersen P (2001) ordfPerspectives of webometricsordm Scientometrics Vol 50No 1 pp 65-82

Bradford SC (1934) ordfSourcesof information of speciregc subjectsordm Engineering Vol 137 pp 85-6

Brin S and Page L (1998) ordfThe anatomy of a large-scale hypertextual web search engineordmComputer Networks and ISDN Systems Vol 30 pp 107-17

Broder A et al (2000) ordfGraph structure in the Webordm Computer Networks and ISDN SystemsVol 33 No 1-6 pp 309-20

BrookesBC (1969) ordfBradfordrsquos law and the bibliography of scienceordm Nature Vol 224 pp 953-6

Codina L (2000) ordfParaAcircmetros e indicadores de calidad para la evaluacioAcircn de recursos digitalesordmin VII Jornadas EspanAuml olas de DocumentacioAcircn (Bilbao 19-21 de octubre de 2000) ServicioEditorial de la Universidad del PaotildeAcircs Vasco Bilbao pp 135-44

JD595

578

Correa-Uribe G (1999) Colombia Conectada al Mundo Sitios Web Colombianos Universidad deAntioquotildeAcirca AntioquotildeAcirca

Cronin B (2001) ordfBibliometrics and beyond some thoughts on web-based citation analysisordmJournal of Information Science Vol 27 No 1 pp 1-7

Cui L (1999) ordfRating health web sites using the principles of citation analysis a bibliometricapproachordm Journal of Medical Internet Research Vol 1 No e4 avalaible atwwwjmirorg19991e4indexhtm (accessed 18 August 2001)

Cybermetrics International Journal of Scientometrics Informetrics and Bibliometrics (2002)available at wwwcindoccsicescybermetrics (accessed 11 August 2002)

Egghe L (2000) ordfNew informetric aspects of the Internet some remacrections plusmn many problemsordmJournal of Information Science Vol 26 No 5 pp 329-35

Ferreiro-AlaAcircez L (1981) ordfAnaAcirclisis de referencias y caracterotildeAcircsticas bibliomeAcirctricas de losconjuntos de revistas nuclearesordm Revista EspanAuml ola de DocumentacioAcircn CientotildeAcircregca Vol 4No 3 pp 181-98

Gorbea-Portal S (1996) El Modelo MatemaAcirctico de Bradford Su AplicacioAcircn a las RevistasLatinoamericanas de las Ciencias BibliotecoloAcircgicas y de la InformacioAcircn UNAM CentroUniversitario de Investigaciones BibliotecoloAcircgicas MeAcircxico

Gupta DK (1991) ordfApplication of Bradfordrsquos law to citation data of Ethiopian medicaljournalsordm Annals of Library Science and Documentation Vol 38 No 3 pp 85-98

Harter SP and Ford CE (2000) ordfWeb-based analyses of e-journal impact approachesproblems and issuesordm Library Science With a Slant to Documentation and InformationStudies Vol 51 No 13 pp 1159-76

Houston W (1983) ordfThe application of bibliometrics to veterinary science primary literatureordmQuarterly Bulletin of International Association of Agricultural Information SpecialistsVol 28 No 1 pp 6-13

Internet Software Consortium (2002) ordfInternet domain surveyordm available at wwwiscorg(accessed 4 November 2002)

JimeAcircnez-Piano M (2001) ordfEvaluacioAcircn de sedes webordm Revista EspanAuml ola de DocumentacioAcircnCientotildeAcircregca Vol 24 No 4 pp 405-32

Kim HJ (2000) ordfMotivations for hyperlinking in scholarly electronic articles a qualitativestudyordm Journal of the American Society for Information Science Vol 51 No 10 pp 887-99

Kumar R et al (2001) ordfTrawling the Web for emerging cyber-communitiesordm in Mendelzon A etal (Eds) Proceedings of the 8th International World Wide Web Conference (TorontoCanadaAcirc May 11-14 1999) available at www8orgw8-papers4a-search-miningtrawlingtrawlinghtml (accessed 2 October 2002) Elsevier Amsterdam

Lal A and Panda KC (1999) ordfBradfordrsquos law and its application to bibliographical data ofplant pathology dissertations an analytical approachordm Library Science With a Slant toDocumentation and Information Studies Vol 36 No 3 pp 193-206

Larson RR (1996) ordfBibliometrics of the World Wide Web and exploratory analysis of theintellectual structure of cyberspaceordm in Hardin S (Ed) Proceedings of the 59th AnnualMeeting of the American Society for Information Science (Baltimore Maryland 1996)Information Today Medford NJ pp 71-8 available at httpsherlockberkeleyeduasis96asis96html (accessed 14 October 2000)

McKiernan G (1996) ordfCitedSites(sm) citation indexing of Web resourcesordm available atwwwpubliciastateedu CYBERSTACKSCitedhtm (accessed 24 February 2000)

Mubeen MA (1996) ordfCitation analysis of doctoral dissertations in chemistryordm Annals of LibraryScience and Documentation Vol 43 No 2 pp 48-58

ordfSitationordmdistributions

579

Price DJ (1970) ordfCitation measures of hard science soft science technology and non-scienceordm inNelson CC and Pollock DE (Eds) Communication Among Scientists and Engineers DCHealth and Co Lexington MA pp 3-22

Reyes-BarragaAcircn MJ et al (2000) ordfRevistas cientotildeAcircregcas determinacioAcircn de necesidades y usosordmRevista EspanAuml ola de DocumentacioAcircn CientotildeAcircregca Vol 23 No 4 pp 417-36

RodrotildeAcircguez i GairotildeAcircn JM (1997) ordfValoracioAcircn del impacto de la informacioAcircn en Internet Altavistael `Citation Indexrsquo de la redordm Revista EspanAuml ola de DocumentacioAcircn CientotildeAcircregca Vol 20 No 2pp 175-81

Rousseau R (1997) ordfCitations an exploratory studyordm Cybermetrics International Journal ofScientometrics Informetrics and Bibliometrics Vol 1 p 1 available atwwwcindoccsicescybermetricsarticlesv1i1p1html (accessed 5 September 2000)

Smith AG (1996) ordfCriteria for evaluation of Internet information resourcesordm available atwwwvuwacnz agsmithevalnindexhtm (accessed 25 March 2000)

Smith AG (1997) ordfTesting the surf criteria for evaluating Internet information resourcesordm ThePublic-Access Computer Systems Review Vol 8 No 3 available at httpinfolibuheduprv8n3smit8n3html (accessed 28 February 2002)

Smith AG (1999) ordfThe impact of web sites a comparison betwen Australasia and LatinAmericaordm available at wwwvuwacnz agsmithpublnsaustlat (accessed 12 May2001)

Smith AG (2001) ordfApplying evaluation criteria to New Zealand government websitesordmInternational Journal of Information Management Vol 21 No 2 pp 137-49

Tague-Sutcliffe J (1992) ordfAn introduction to informetricsordm Information Processing andManagement Vol 28 No 1 pp 1-3

Tillman HN (2000) ordfEvaluating quality on the Netordm available atwwwtiacnetusershoperegndqualhtml (accessed 5 June 2000)

van Raan AFJ (2001) ordfBibliometrics and Internet some observations and expectationsordmScientometrics Vol 50 No 1 pp 59-63

Vreeland RC (2000) ordfLaw libraries in hyperspacea citation analysis of World Wide Web sitesordmLaw Library Journal Vol 92 No 1 pp 9-25

Zhang P and von Dran GM (2000) ordfSatisregers and dissatisregers a two-factor model for websitedesign and evaluationordm Journal of the American Society for Information Science Vol 51No 14 pp 1253-68

JD595

580

Figure 1Fraction of pages of theregrst set that receive agiven number of inlinksboth including andexcluding erroneousURLs compared with anideal power lawdistribution

JD595

566

Figure 2Fraction of pages of thesecond set that receive agiven number of inlinkscounting on the one handall the links and on the

other only the originpages for each link (in

both cases after havingeliminated the

erroneous links)

ordfSitationordmdistributions

567

Figure 3Fraction of pages of theregrst set that receive agiven number of inlinkscounting only the originWeb spaces of each linkboth including andexcluding the erroneousURLs compared with theideal power lawdistribution

JD595

568

Figure 4Fraction of Web spaces

of the second set thatreceive a given number

of inlinks both includingand excluding the group

of targets that werelinked to in a large

percentage of the pagesof some very large Web

spaces

ordfSitationordmdistributions

569

Figure 5Fraction of Web spacesthat receive a givennumber of inlinkscomparing the regrst setwith the secondincluding only targetsdealing withExtremadura andexcluding targets thatwere very frequentlylinked to in very largeWeb spaces

JD595

570

being Web spaces which deal with Extremadura We compare the data fromthe regrst set with the data from the second One observes the logical decrease inthe number of elements but also a decrease in the slope that regts best anexponent of 15

As we mentioned above Bradfordrsquos law distributions are traditionallyrepresented differently from power law distributions so the presentation ofFigures 6-10 will be slightly different from the previous regve with accumulatedsitations plotted against sitors using a logarithmic scale for the latter sincetheir number increases exponentially with the accumulated sitationsBradfordrsquos law is of course normally applied to scientiregc journals countingthe citations received by each journal These citations point from the citingarticles (which in turn belong to journals) towards the journalrsquos articles Acertain ambiguity arises in the Web analogue however When a certain work iscited several times in an article it is only counted once Hence in our case toowhen a certain target is linked to several times it should also only be countedonce The problem is though that it is not clear that the Web spaces and pagesof sitation analysis should be taken as the analogues of journals and articlesrespectively or rather that an entire Web space should be taken as theanalogue of an article We shall look at both possibilities

Figure 6 shows the results for the second set when we count only onesitation per Web space for each target We present the results for target pagesindependently after grouping the pages by Web space and after alsoeliminating the frequent links as we did in the power law case This thereforeregards the analogues of the citor articles to be Web spaces the analogues ofthe cited journals to be pages or entire Web spaces (depending on the case) Thedata do not regt Bradfordrsquos law since the curve is concave from above Hence thenumber of sitations appears to grow exponentially and indeed if a logarithmicscale is also used for the vertical axis the result is practically a straight lineLikewise the verbal statement of the law is not satisreged since the proportion1nn2 does not hold

Figure 7 shows the same three distributions for the regrst set but nowcounting sitor pages This means that we are now considering Web pages to bethe analogues of citor articles which seems more natural One sees that the regrstdistribution corresponding to pages lies in the upper part of the plot at regrstrunning coincident with the Web space distribution and then rising above it

The second distribution grouped by Web space includes points ofinmacrection but in synthesis differs from a Bradford law curve in the followingaspects it begins at a point clearly above the origin the linear portion does notbegin at the regrst point of inmacrection but later and it ends with a stronglyincreasing section

The third distribution occupies the lower part of the plot This means thateliminating the links found in a large percentage of the pages of very large Webspaces eliminates many sitations but not many Web spaces or linked pages

ordfSitationordmdistributions

571

Figure 6Accumulated sitations vsaccumulated Webspaces The data arefrom the second set Theresults are presented fortarget pagesindependently groupedby Web space andexcluding targets thatwere very frequentlylinked to in very largeWeb spaces Only sitorWeb spaces are counted

JD595

572

Figure 7Accumulated sitations vs

accumulated Webspaces The data are

from the regrst set Theresults are presented for

target pagesindependently grouped

by Web space andexcluding targets thatwere very frequently

linked to in very largeWeb spaces Sitor pages

are counted

ordfSitationordmdistributions

573

Figure 8Accumulated sitations vsaccumulated Webspaces The data arefrom the regrst set Theresults are presented fortarget pagesindependently groupedby Web space andexcluding targets thatwere very frequentlylinked to in very largeWeb spaces Sitor pagesare counted but onlytargets aboutExtremadura

JD595

574

Figure 9Accumulated sitations vs

accumulated Webspaces The data are

from the second set Theresults are presented for

target pagesindependently grouped

by Web space andexcluding targets thatwere very frequently

linked to in very largeWeb spaces Sitor pages

are counted

ordfSitationordmdistributions

575

Figure 10Accumulated sitations vsaccumulated Webspaces The data arefrom the second set Theresults are presented fortarget pagesindependently groupedby Web space andexcluding targets thatwere very frequentlylinked to in very largeWeb spaces Sitor pagesare counted but onlytargets aboutExtremadura

JD595

576

Although as this distribution is plotted together with the other distributionsits slope is less clearly perceptible one nevertheless observes that the slopeincreases continuously implying an exponential growth of the accumulatedsitations also Indeed if a logarithmic scale is also used on the vertical axis theresult is a straight line with no point of inmacrection and not regtting Bradfordrsquoslaw It is interesting that with the elimination of this set of links the dependenceof the sitations becomes exponential Again the verbal statement of the law isnot satisreged since the proportion 1nn2 does not hold

Some authors Brookes (1969) for instance consider that Bradfordrsquos law ismainly applicable in small and well-deregned areas of research in which theremust exist a strong thematic relationship between the documents In Figure 8we have therefore represented only the pages or Web spaces that are fromExtremadura (for the regrst set of data) The behaviour however is quite similarto the previous case the regrst two distributions are fairly similar to each otherwith the greatest difference being in the third distribution which is a straightline

Figures 9 and 10 show the same calculations as Figures 7 and 8 but now fordata from the second set The results are identical

ConclusionsThe present experiment focused on the distribution of citations in a closed butheterogeneous environment of thematically related Web spaces using two setsof data We studied how they regt on the one hand a power law distribution andon the other Bradfordrsquos law looking at a great many variants with a total of 384plots

Studies in the literature have shown that on a very large scale sitationsfollow a power law distribution with an exponent of 21 The present sitationdistributions from both data sets were coherent with those earlier studiesincluding in the exponent independently of whether or not the counts includedbroken links total links or only sitor pages and pages or Web spaces both asorigin and as target We showed that the distributions differed howeveraccording to which calculation procedure was used and to what reglter wasapplied to the data In particular when we considered only Web spaces asorigins instead of pages or links and pages as destinations the slope increasedto correspond approximately to an exponent of 28 When however bothdestinations and origins were Web spaces the exponent returned to the value21 Also when the destinations were thematically restricted (in which casethey practically coincided with the citors) the slope was reduced to correspondapproximately to an exponent of 15 Finally the tail in the data was reduced byeliminating the targets that were very frequently linked to in very large Webspaces

We studied the regt to Bradfordrsquos law by plotting the accumulated sitationsagainst the accumulated targets Although we presented many distributions

ordfSitationordmdistributions

577

none of them regtted the typical Bradford case When we considered Web spacesas the sitors the resulting distribution was concave from above with no pointof inmacrection and passing through the origin The other distributions in whichpages were considered as the sitors could in general be characterized as beingcurves which start noticeably above the origin and have several points ofinmacrection ( ) It was also notable regrstly that taking Web spaces asdestinations instead of pages greatly reduced the accumulated sitations andmuch of the complexity of the curve and secondly that eliminating the targetsthat were very frequently linked to in very large Web spaces sharply reducedthe accumulated sitations and the curves changed now to pass through theorigin and presented an exponential dependence on the accumulated sitations

In sum everything seems to support the observation of Kim (2000) that thereare different motivations behind the citations in scientiregc articles and the linksof the World Wide Web In this sense we conregrmed the immense inmacruence ofthematic restriction (for instance nearly all the Web spaces included linksrelated to the World Wide Web itself and its technology) and of the repetitivelinks found in most of the very large Web spaces (which are usually generatedby means of software tools that allow links to be automatically included onevery page)

References

Aguillo IF (2000) ordfIndicadores hacia una evaluacioAcircn no objetiva (cuantitativa) de sedes webordm inVII Jornadas EspanAuml olas de DocumentacioAcircn (Bilbao 19-21 de octubre de 2000) ServicioEditorial de la Universidad del PaotildeAcircs Vasco Bilbao pp 233-48

Almind TC and Ingwersen P (1997) ordfInformetric analyses on the World Wide Webmethodological approaches to `Webometricsrsquoordm Journal of Documentation Vol 53 No 4pp 404-26

Bar-Ilan J (1997) ordfThe `mad cow diseasersquo Usenet newsgroups and bibliometric lawsordmScientometrics Vol 39 No 1 pp 29-55

Bar-Ilan J (2001) ordfData collection methods on the Web for informetrics purposes a review andanalysisordm Scientometrics Vol 50 No 1 pp 7-32

BarabaAcircsi AL and Albert R (1999) ordfEmergence of scaling in random networksordm ScienceVol 286 pp 509-12

BjoEgraverneborn L and Ingwersen P (2001) ordfPerspectives of webometricsordm Scientometrics Vol 50No 1 pp 65-82

Bradford SC (1934) ordfSourcesof information of speciregc subjectsordm Engineering Vol 137 pp 85-6

Brin S and Page L (1998) ordfThe anatomy of a large-scale hypertextual web search engineordmComputer Networks and ISDN Systems Vol 30 pp 107-17

Broder A et al (2000) ordfGraph structure in the Webordm Computer Networks and ISDN SystemsVol 33 No 1-6 pp 309-20

BrookesBC (1969) ordfBradfordrsquos law and the bibliography of scienceordm Nature Vol 224 pp 953-6

Codina L (2000) ordfParaAcircmetros e indicadores de calidad para la evaluacioAcircn de recursos digitalesordmin VII Jornadas EspanAuml olas de DocumentacioAcircn (Bilbao 19-21 de octubre de 2000) ServicioEditorial de la Universidad del PaotildeAcircs Vasco Bilbao pp 135-44

JD595

578

Correa-Uribe G (1999) Colombia Conectada al Mundo Sitios Web Colombianos Universidad deAntioquotildeAcirca AntioquotildeAcirca

Cronin B (2001) ordfBibliometrics and beyond some thoughts on web-based citation analysisordmJournal of Information Science Vol 27 No 1 pp 1-7

Cui L (1999) ordfRating health web sites using the principles of citation analysis a bibliometricapproachordm Journal of Medical Internet Research Vol 1 No e4 avalaible atwwwjmirorg19991e4indexhtm (accessed 18 August 2001)

Cybermetrics International Journal of Scientometrics Informetrics and Bibliometrics (2002)available at wwwcindoccsicescybermetrics (accessed 11 August 2002)

Egghe L (2000) ordfNew informetric aspects of the Internet some remacrections plusmn many problemsordmJournal of Information Science Vol 26 No 5 pp 329-35

Ferreiro-AlaAcircez L (1981) ordfAnaAcirclisis de referencias y caracterotildeAcircsticas bibliomeAcirctricas de losconjuntos de revistas nuclearesordm Revista EspanAuml ola de DocumentacioAcircn CientotildeAcircregca Vol 4No 3 pp 181-98

Gorbea-Portal S (1996) El Modelo MatemaAcirctico de Bradford Su AplicacioAcircn a las RevistasLatinoamericanas de las Ciencias BibliotecoloAcircgicas y de la InformacioAcircn UNAM CentroUniversitario de Investigaciones BibliotecoloAcircgicas MeAcircxico

Gupta DK (1991) ordfApplication of Bradfordrsquos law to citation data of Ethiopian medicaljournalsordm Annals of Library Science and Documentation Vol 38 No 3 pp 85-98

Harter SP and Ford CE (2000) ordfWeb-based analyses of e-journal impact approachesproblems and issuesordm Library Science With a Slant to Documentation and InformationStudies Vol 51 No 13 pp 1159-76

Houston W (1983) ordfThe application of bibliometrics to veterinary science primary literatureordmQuarterly Bulletin of International Association of Agricultural Information SpecialistsVol 28 No 1 pp 6-13

Internet Software Consortium (2002) ordfInternet domain surveyordm available at wwwiscorg(accessed 4 November 2002)

JimeAcircnez-Piano M (2001) ordfEvaluacioAcircn de sedes webordm Revista EspanAuml ola de DocumentacioAcircnCientotildeAcircregca Vol 24 No 4 pp 405-32

Kim HJ (2000) ordfMotivations for hyperlinking in scholarly electronic articles a qualitativestudyordm Journal of the American Society for Information Science Vol 51 No 10 pp 887-99

Kumar R et al (2001) ordfTrawling the Web for emerging cyber-communitiesordm in Mendelzon A etal (Eds) Proceedings of the 8th International World Wide Web Conference (TorontoCanadaAcirc May 11-14 1999) available at www8orgw8-papers4a-search-miningtrawlingtrawlinghtml (accessed 2 October 2002) Elsevier Amsterdam

Lal A and Panda KC (1999) ordfBradfordrsquos law and its application to bibliographical data ofplant pathology dissertations an analytical approachordm Library Science With a Slant toDocumentation and Information Studies Vol 36 No 3 pp 193-206

Larson RR (1996) ordfBibliometrics of the World Wide Web and exploratory analysis of theintellectual structure of cyberspaceordm in Hardin S (Ed) Proceedings of the 59th AnnualMeeting of the American Society for Information Science (Baltimore Maryland 1996)Information Today Medford NJ pp 71-8 available at httpsherlockberkeleyeduasis96asis96html (accessed 14 October 2000)

McKiernan G (1996) ordfCitedSites(sm) citation indexing of Web resourcesordm available atwwwpubliciastateedu CYBERSTACKSCitedhtm (accessed 24 February 2000)

Mubeen MA (1996) ordfCitation analysis of doctoral dissertations in chemistryordm Annals of LibraryScience and Documentation Vol 43 No 2 pp 48-58

ordfSitationordmdistributions

579

Price DJ (1970) ordfCitation measures of hard science soft science technology and non-scienceordm inNelson CC and Pollock DE (Eds) Communication Among Scientists and Engineers DCHealth and Co Lexington MA pp 3-22

Reyes-BarragaAcircn MJ et al (2000) ordfRevistas cientotildeAcircregcas determinacioAcircn de necesidades y usosordmRevista EspanAuml ola de DocumentacioAcircn CientotildeAcircregca Vol 23 No 4 pp 417-36

RodrotildeAcircguez i GairotildeAcircn JM (1997) ordfValoracioAcircn del impacto de la informacioAcircn en Internet Altavistael `Citation Indexrsquo de la redordm Revista EspanAuml ola de DocumentacioAcircn CientotildeAcircregca Vol 20 No 2pp 175-81

Rousseau R (1997) ordfCitations an exploratory studyordm Cybermetrics International Journal ofScientometrics Informetrics and Bibliometrics Vol 1 p 1 available atwwwcindoccsicescybermetricsarticlesv1i1p1html (accessed 5 September 2000)

Smith AG (1996) ordfCriteria for evaluation of Internet information resourcesordm available atwwwvuwacnz agsmithevalnindexhtm (accessed 25 March 2000)

Smith AG (1997) ordfTesting the surf criteria for evaluating Internet information resourcesordm ThePublic-Access Computer Systems Review Vol 8 No 3 available at httpinfolibuheduprv8n3smit8n3html (accessed 28 February 2002)

Smith AG (1999) ordfThe impact of web sites a comparison betwen Australasia and LatinAmericaordm available at wwwvuwacnz agsmithpublnsaustlat (accessed 12 May2001)

Smith AG (2001) ordfApplying evaluation criteria to New Zealand government websitesordmInternational Journal of Information Management Vol 21 No 2 pp 137-49

Tague-Sutcliffe J (1992) ordfAn introduction to informetricsordm Information Processing andManagement Vol 28 No 1 pp 1-3

Tillman HN (2000) ordfEvaluating quality on the Netordm available atwwwtiacnetusershoperegndqualhtml (accessed 5 June 2000)

van Raan AFJ (2001) ordfBibliometrics and Internet some observations and expectationsordmScientometrics Vol 50 No 1 pp 59-63

Vreeland RC (2000) ordfLaw libraries in hyperspacea citation analysis of World Wide Web sitesordmLaw Library Journal Vol 92 No 1 pp 9-25

Zhang P and von Dran GM (2000) ordfSatisregers and dissatisregers a two-factor model for websitedesign and evaluationordm Journal of the American Society for Information Science Vol 51No 14 pp 1253-68

JD595

580

Figure 2Fraction of pages of thesecond set that receive agiven number of inlinkscounting on the one handall the links and on the

other only the originpages for each link (in

both cases after havingeliminated the

erroneous links)

ordfSitationordmdistributions

567

Figure 3Fraction of pages of theregrst set that receive agiven number of inlinkscounting only the originWeb spaces of each linkboth including andexcluding the erroneousURLs compared with theideal power lawdistribution

JD595

568

Figure 4Fraction of Web spaces

of the second set thatreceive a given number

of inlinks both includingand excluding the group

of targets that werelinked to in a large

percentage of the pagesof some very large Web

spaces

ordfSitationordmdistributions

569

Figure 5Fraction of Web spacesthat receive a givennumber of inlinkscomparing the regrst setwith the secondincluding only targetsdealing withExtremadura andexcluding targets thatwere very frequentlylinked to in very largeWeb spaces

JD595

570

being Web spaces which deal with Extremadura We compare the data fromthe regrst set with the data from the second One observes the logical decrease inthe number of elements but also a decrease in the slope that regts best anexponent of 15

As we mentioned above Bradfordrsquos law distributions are traditionallyrepresented differently from power law distributions so the presentation ofFigures 6-10 will be slightly different from the previous regve with accumulatedsitations plotted against sitors using a logarithmic scale for the latter sincetheir number increases exponentially with the accumulated sitationsBradfordrsquos law is of course normally applied to scientiregc journals countingthe citations received by each journal These citations point from the citingarticles (which in turn belong to journals) towards the journalrsquos articles Acertain ambiguity arises in the Web analogue however When a certain work iscited several times in an article it is only counted once Hence in our case toowhen a certain target is linked to several times it should also only be countedonce The problem is though that it is not clear that the Web spaces and pagesof sitation analysis should be taken as the analogues of journals and articlesrespectively or rather that an entire Web space should be taken as theanalogue of an article We shall look at both possibilities

Figure 6 shows the results for the second set when we count only onesitation per Web space for each target We present the results for target pagesindependently after grouping the pages by Web space and after alsoeliminating the frequent links as we did in the power law case This thereforeregards the analogues of the citor articles to be Web spaces the analogues ofthe cited journals to be pages or entire Web spaces (depending on the case) Thedata do not regt Bradfordrsquos law since the curve is concave from above Hence thenumber of sitations appears to grow exponentially and indeed if a logarithmicscale is also used for the vertical axis the result is practically a straight lineLikewise the verbal statement of the law is not satisreged since the proportion1nn2 does not hold

Figure 7 shows the same three distributions for the regrst set but nowcounting sitor pages This means that we are now considering Web pages to bethe analogues of citor articles which seems more natural One sees that the regrstdistribution corresponding to pages lies in the upper part of the plot at regrstrunning coincident with the Web space distribution and then rising above it

The second distribution grouped by Web space includes points ofinmacrection but in synthesis differs from a Bradford law curve in the followingaspects it begins at a point clearly above the origin the linear portion does notbegin at the regrst point of inmacrection but later and it ends with a stronglyincreasing section

The third distribution occupies the lower part of the plot This means thateliminating the links found in a large percentage of the pages of very large Webspaces eliminates many sitations but not many Web spaces or linked pages

ordfSitationordmdistributions

571

Figure 6Accumulated sitations vsaccumulated Webspaces The data arefrom the second set Theresults are presented fortarget pagesindependently groupedby Web space andexcluding targets thatwere very frequentlylinked to in very largeWeb spaces Only sitorWeb spaces are counted

JD595

572

Figure 7Accumulated sitations vs

accumulated Webspaces The data are

from the regrst set Theresults are presented for

target pagesindependently grouped

by Web space andexcluding targets thatwere very frequently

linked to in very largeWeb spaces Sitor pages

are counted

ordfSitationordmdistributions

573

Figure 8Accumulated sitations vsaccumulated Webspaces The data arefrom the regrst set Theresults are presented fortarget pagesindependently groupedby Web space andexcluding targets thatwere very frequentlylinked to in very largeWeb spaces Sitor pagesare counted but onlytargets aboutExtremadura

JD595

574

Figure 9Accumulated sitations vs

accumulated Webspaces The data are

from the second set Theresults are presented for

target pagesindependently grouped

by Web space andexcluding targets thatwere very frequently

linked to in very largeWeb spaces Sitor pages

are counted

ordfSitationordmdistributions

575

Figure 10Accumulated sitations vsaccumulated Webspaces The data arefrom the second set Theresults are presented fortarget pagesindependently groupedby Web space andexcluding targets thatwere very frequentlylinked to in very largeWeb spaces Sitor pagesare counted but onlytargets aboutExtremadura

JD595

576

Although as this distribution is plotted together with the other distributionsits slope is less clearly perceptible one nevertheless observes that the slopeincreases continuously implying an exponential growth of the accumulatedsitations also Indeed if a logarithmic scale is also used on the vertical axis theresult is a straight line with no point of inmacrection and not regtting Bradfordrsquoslaw It is interesting that with the elimination of this set of links the dependenceof the sitations becomes exponential Again the verbal statement of the law isnot satisreged since the proportion 1nn2 does not hold

Some authors Brookes (1969) for instance consider that Bradfordrsquos law ismainly applicable in small and well-deregned areas of research in which theremust exist a strong thematic relationship between the documents In Figure 8we have therefore represented only the pages or Web spaces that are fromExtremadura (for the regrst set of data) The behaviour however is quite similarto the previous case the regrst two distributions are fairly similar to each otherwith the greatest difference being in the third distribution which is a straightline

Figures 9 and 10 show the same calculations as Figures 7 and 8 but now fordata from the second set The results are identical

ConclusionsThe present experiment focused on the distribution of citations in a closed butheterogeneous environment of thematically related Web spaces using two setsof data We studied how they regt on the one hand a power law distribution andon the other Bradfordrsquos law looking at a great many variants with a total of 384plots

Studies in the literature have shown that on a very large scale sitationsfollow a power law distribution with an exponent of 21 The present sitationdistributions from both data sets were coherent with those earlier studiesincluding in the exponent independently of whether or not the counts includedbroken links total links or only sitor pages and pages or Web spaces both asorigin and as target We showed that the distributions differed howeveraccording to which calculation procedure was used and to what reglter wasapplied to the data In particular when we considered only Web spaces asorigins instead of pages or links and pages as destinations the slope increasedto correspond approximately to an exponent of 28 When however bothdestinations and origins were Web spaces the exponent returned to the value21 Also when the destinations were thematically restricted (in which casethey practically coincided with the citors) the slope was reduced to correspondapproximately to an exponent of 15 Finally the tail in the data was reduced byeliminating the targets that were very frequently linked to in very large Webspaces

We studied the regt to Bradfordrsquos law by plotting the accumulated sitationsagainst the accumulated targets Although we presented many distributions

ordfSitationordmdistributions

577

none of them regtted the typical Bradford case When we considered Web spacesas the sitors the resulting distribution was concave from above with no pointof inmacrection and passing through the origin The other distributions in whichpages were considered as the sitors could in general be characterized as beingcurves which start noticeably above the origin and have several points ofinmacrection ( ) It was also notable regrstly that taking Web spaces asdestinations instead of pages greatly reduced the accumulated sitations andmuch of the complexity of the curve and secondly that eliminating the targetsthat were very frequently linked to in very large Web spaces sharply reducedthe accumulated sitations and the curves changed now to pass through theorigin and presented an exponential dependence on the accumulated sitations

In sum everything seems to support the observation of Kim (2000) that thereare different motivations behind the citations in scientiregc articles and the linksof the World Wide Web In this sense we conregrmed the immense inmacruence ofthematic restriction (for instance nearly all the Web spaces included linksrelated to the World Wide Web itself and its technology) and of the repetitivelinks found in most of the very large Web spaces (which are usually generatedby means of software tools that allow links to be automatically included onevery page)

References

Aguillo IF (2000) ordfIndicadores hacia una evaluacioAcircn no objetiva (cuantitativa) de sedes webordm inVII Jornadas EspanAuml olas de DocumentacioAcircn (Bilbao 19-21 de octubre de 2000) ServicioEditorial de la Universidad del PaotildeAcircs Vasco Bilbao pp 233-48

Almind TC and Ingwersen P (1997) ordfInformetric analyses on the World Wide Webmethodological approaches to `Webometricsrsquoordm Journal of Documentation Vol 53 No 4pp 404-26

Bar-Ilan J (1997) ordfThe `mad cow diseasersquo Usenet newsgroups and bibliometric lawsordmScientometrics Vol 39 No 1 pp 29-55

Bar-Ilan J (2001) ordfData collection methods on the Web for informetrics purposes a review andanalysisordm Scientometrics Vol 50 No 1 pp 7-32

BarabaAcircsi AL and Albert R (1999) ordfEmergence of scaling in random networksordm ScienceVol 286 pp 509-12

BjoEgraverneborn L and Ingwersen P (2001) ordfPerspectives of webometricsordm Scientometrics Vol 50No 1 pp 65-82

Bradford SC (1934) ordfSourcesof information of speciregc subjectsordm Engineering Vol 137 pp 85-6

Brin S and Page L (1998) ordfThe anatomy of a large-scale hypertextual web search engineordmComputer Networks and ISDN Systems Vol 30 pp 107-17

Broder A et al (2000) ordfGraph structure in the Webordm Computer Networks and ISDN SystemsVol 33 No 1-6 pp 309-20

BrookesBC (1969) ordfBradfordrsquos law and the bibliography of scienceordm Nature Vol 224 pp 953-6

Codina L (2000) ordfParaAcircmetros e indicadores de calidad para la evaluacioAcircn de recursos digitalesordmin VII Jornadas EspanAuml olas de DocumentacioAcircn (Bilbao 19-21 de octubre de 2000) ServicioEditorial de la Universidad del PaotildeAcircs Vasco Bilbao pp 135-44

JD595

578

Correa-Uribe G (1999) Colombia Conectada al Mundo Sitios Web Colombianos Universidad deAntioquotildeAcirca AntioquotildeAcirca

Cronin B (2001) ordfBibliometrics and beyond some thoughts on web-based citation analysisordmJournal of Information Science Vol 27 No 1 pp 1-7

Cui L (1999) ordfRating health web sites using the principles of citation analysis a bibliometricapproachordm Journal of Medical Internet Research Vol 1 No e4 avalaible atwwwjmirorg19991e4indexhtm (accessed 18 August 2001)

Cybermetrics International Journal of Scientometrics Informetrics and Bibliometrics (2002)available at wwwcindoccsicescybermetrics (accessed 11 August 2002)

Egghe L (2000) ordfNew informetric aspects of the Internet some remacrections plusmn many problemsordmJournal of Information Science Vol 26 No 5 pp 329-35

Ferreiro-AlaAcircez L (1981) ordfAnaAcirclisis de referencias y caracterotildeAcircsticas bibliomeAcirctricas de losconjuntos de revistas nuclearesordm Revista EspanAuml ola de DocumentacioAcircn CientotildeAcircregca Vol 4No 3 pp 181-98

Gorbea-Portal S (1996) El Modelo MatemaAcirctico de Bradford Su AplicacioAcircn a las RevistasLatinoamericanas de las Ciencias BibliotecoloAcircgicas y de la InformacioAcircn UNAM CentroUniversitario de Investigaciones BibliotecoloAcircgicas MeAcircxico

Gupta DK (1991) ordfApplication of Bradfordrsquos law to citation data of Ethiopian medicaljournalsordm Annals of Library Science and Documentation Vol 38 No 3 pp 85-98

Harter SP and Ford CE (2000) ordfWeb-based analyses of e-journal impact approachesproblems and issuesordm Library Science With a Slant to Documentation and InformationStudies Vol 51 No 13 pp 1159-76

Houston W (1983) ordfThe application of bibliometrics to veterinary science primary literatureordmQuarterly Bulletin of International Association of Agricultural Information SpecialistsVol 28 No 1 pp 6-13

Internet Software Consortium (2002) ordfInternet domain surveyordm available at wwwiscorg(accessed 4 November 2002)

JimeAcircnez-Piano M (2001) ordfEvaluacioAcircn de sedes webordm Revista EspanAuml ola de DocumentacioAcircnCientotildeAcircregca Vol 24 No 4 pp 405-32

Kim HJ (2000) ordfMotivations for hyperlinking in scholarly electronic articles a qualitativestudyordm Journal of the American Society for Information Science Vol 51 No 10 pp 887-99

Kumar R et al (2001) ordfTrawling the Web for emerging cyber-communitiesordm in Mendelzon A etal (Eds) Proceedings of the 8th International World Wide Web Conference (TorontoCanadaAcirc May 11-14 1999) available at www8orgw8-papers4a-search-miningtrawlingtrawlinghtml (accessed 2 October 2002) Elsevier Amsterdam

Lal A and Panda KC (1999) ordfBradfordrsquos law and its application to bibliographical data ofplant pathology dissertations an analytical approachordm Library Science With a Slant toDocumentation and Information Studies Vol 36 No 3 pp 193-206

Larson RR (1996) ordfBibliometrics of the World Wide Web and exploratory analysis of theintellectual structure of cyberspaceordm in Hardin S (Ed) Proceedings of the 59th AnnualMeeting of the American Society for Information Science (Baltimore Maryland 1996)Information Today Medford NJ pp 71-8 available at httpsherlockberkeleyeduasis96asis96html (accessed 14 October 2000)

McKiernan G (1996) ordfCitedSites(sm) citation indexing of Web resourcesordm available atwwwpubliciastateedu CYBERSTACKSCitedhtm (accessed 24 February 2000)

Mubeen MA (1996) ordfCitation analysis of doctoral dissertations in chemistryordm Annals of LibraryScience and Documentation Vol 43 No 2 pp 48-58

ordfSitationordmdistributions

579

Price DJ (1970) ordfCitation measures of hard science soft science technology and non-scienceordm inNelson CC and Pollock DE (Eds) Communication Among Scientists and Engineers DCHealth and Co Lexington MA pp 3-22

Reyes-BarragaAcircn MJ et al (2000) ordfRevistas cientotildeAcircregcas determinacioAcircn de necesidades y usosordmRevista EspanAuml ola de DocumentacioAcircn CientotildeAcircregca Vol 23 No 4 pp 417-36

RodrotildeAcircguez i GairotildeAcircn JM (1997) ordfValoracioAcircn del impacto de la informacioAcircn en Internet Altavistael `Citation Indexrsquo de la redordm Revista EspanAuml ola de DocumentacioAcircn CientotildeAcircregca Vol 20 No 2pp 175-81

Rousseau R (1997) ordfCitations an exploratory studyordm Cybermetrics International Journal ofScientometrics Informetrics and Bibliometrics Vol 1 p 1 available atwwwcindoccsicescybermetricsarticlesv1i1p1html (accessed 5 September 2000)

Smith AG (1996) ordfCriteria for evaluation of Internet information resourcesordm available atwwwvuwacnz agsmithevalnindexhtm (accessed 25 March 2000)

Smith AG (1997) ordfTesting the surf criteria for evaluating Internet information resourcesordm ThePublic-Access Computer Systems Review Vol 8 No 3 available at httpinfolibuheduprv8n3smit8n3html (accessed 28 February 2002)

Smith AG (1999) ordfThe impact of web sites a comparison betwen Australasia and LatinAmericaordm available at wwwvuwacnz agsmithpublnsaustlat (accessed 12 May2001)

Smith AG (2001) ordfApplying evaluation criteria to New Zealand government websitesordmInternational Journal of Information Management Vol 21 No 2 pp 137-49

Tague-Sutcliffe J (1992) ordfAn introduction to informetricsordm Information Processing andManagement Vol 28 No 1 pp 1-3

Tillman HN (2000) ordfEvaluating quality on the Netordm available atwwwtiacnetusershoperegndqualhtml (accessed 5 June 2000)

van Raan AFJ (2001) ordfBibliometrics and Internet some observations and expectationsordmScientometrics Vol 50 No 1 pp 59-63

Vreeland RC (2000) ordfLaw libraries in hyperspacea citation analysis of World Wide Web sitesordmLaw Library Journal Vol 92 No 1 pp 9-25

Zhang P and von Dran GM (2000) ordfSatisregers and dissatisregers a two-factor model for websitedesign and evaluationordm Journal of the American Society for Information Science Vol 51No 14 pp 1253-68

JD595

580

Figure 3Fraction of pages of theregrst set that receive agiven number of inlinkscounting only the originWeb spaces of each linkboth including andexcluding the erroneousURLs compared with theideal power lawdistribution

JD595

568

Figure 4Fraction of Web spaces

of the second set thatreceive a given number

of inlinks both includingand excluding the group

of targets that werelinked to in a large

percentage of the pagesof some very large Web

spaces

ordfSitationordmdistributions

569

Figure 5Fraction of Web spacesthat receive a givennumber of inlinkscomparing the regrst setwith the secondincluding only targetsdealing withExtremadura andexcluding targets thatwere very frequentlylinked to in very largeWeb spaces

JD595

570

being Web spaces which deal with Extremadura We compare the data fromthe regrst set with the data from the second One observes the logical decrease inthe number of elements but also a decrease in the slope that regts best anexponent of 15

As we mentioned above Bradfordrsquos law distributions are traditionallyrepresented differently from power law distributions so the presentation ofFigures 6-10 will be slightly different from the previous regve with accumulatedsitations plotted against sitors using a logarithmic scale for the latter sincetheir number increases exponentially with the accumulated sitationsBradfordrsquos law is of course normally applied to scientiregc journals countingthe citations received by each journal These citations point from the citingarticles (which in turn belong to journals) towards the journalrsquos articles Acertain ambiguity arises in the Web analogue however When a certain work iscited several times in an article it is only counted once Hence in our case toowhen a certain target is linked to several times it should also only be countedonce The problem is though that it is not clear that the Web spaces and pagesof sitation analysis should be taken as the analogues of journals and articlesrespectively or rather that an entire Web space should be taken as theanalogue of an article We shall look at both possibilities

Figure 6 shows the results for the second set when we count only onesitation per Web space for each target We present the results for target pagesindependently after grouping the pages by Web space and after alsoeliminating the frequent links as we did in the power law case This thereforeregards the analogues of the citor articles to be Web spaces the analogues ofthe cited journals to be pages or entire Web spaces (depending on the case) Thedata do not regt Bradfordrsquos law since the curve is concave from above Hence thenumber of sitations appears to grow exponentially and indeed if a logarithmicscale is also used for the vertical axis the result is practically a straight lineLikewise the verbal statement of the law is not satisreged since the proportion1nn2 does not hold

Figure 7 shows the same three distributions for the regrst set but nowcounting sitor pages This means that we are now considering Web pages to bethe analogues of citor articles which seems more natural One sees that the regrstdistribution corresponding to pages lies in the upper part of the plot at regrstrunning coincident with the Web space distribution and then rising above it

The second distribution grouped by Web space includes points ofinmacrection but in synthesis differs from a Bradford law curve in the followingaspects it begins at a point clearly above the origin the linear portion does notbegin at the regrst point of inmacrection but later and it ends with a stronglyincreasing section

The third distribution occupies the lower part of the plot This means thateliminating the links found in a large percentage of the pages of very large Webspaces eliminates many sitations but not many Web spaces or linked pages

ordfSitationordmdistributions

571

Figure 6Accumulated sitations vsaccumulated Webspaces The data arefrom the second set Theresults are presented fortarget pagesindependently groupedby Web space andexcluding targets thatwere very frequentlylinked to in very largeWeb spaces Only sitorWeb spaces are counted

JD595

572

Figure 7Accumulated sitations vs

accumulated Webspaces The data are

from the regrst set Theresults are presented for

target pagesindependently grouped

by Web space andexcluding targets thatwere very frequently

linked to in very largeWeb spaces Sitor pages

are counted

ordfSitationordmdistributions

573

Figure 8Accumulated sitations vsaccumulated Webspaces The data arefrom the regrst set Theresults are presented fortarget pagesindependently groupedby Web space andexcluding targets thatwere very frequentlylinked to in very largeWeb spaces Sitor pagesare counted but onlytargets aboutExtremadura

JD595

574

Figure 9Accumulated sitations vs

accumulated Webspaces The data are

from the second set Theresults are presented for

target pagesindependently grouped

by Web space andexcluding targets thatwere very frequently

linked to in very largeWeb spaces Sitor pages

are counted

ordfSitationordmdistributions

575

Figure 10Accumulated sitations vsaccumulated Webspaces The data arefrom the second set Theresults are presented fortarget pagesindependently groupedby Web space andexcluding targets thatwere very frequentlylinked to in very largeWeb spaces Sitor pagesare counted but onlytargets aboutExtremadura

JD595

576

Although as this distribution is plotted together with the other distributionsits slope is less clearly perceptible one nevertheless observes that the slopeincreases continuously implying an exponential growth of the accumulatedsitations also Indeed if a logarithmic scale is also used on the vertical axis theresult is a straight line with no point of inmacrection and not regtting Bradfordrsquoslaw It is interesting that with the elimination of this set of links the dependenceof the sitations becomes exponential Again the verbal statement of the law isnot satisreged since the proportion 1nn2 does not hold

Some authors Brookes (1969) for instance consider that Bradfordrsquos law ismainly applicable in small and well-deregned areas of research in which theremust exist a strong thematic relationship between the documents In Figure 8we have therefore represented only the pages or Web spaces that are fromExtremadura (for the regrst set of data) The behaviour however is quite similarto the previous case the regrst two distributions are fairly similar to each otherwith the greatest difference being in the third distribution which is a straightline

Figures 9 and 10 show the same calculations as Figures 7 and 8 but now fordata from the second set The results are identical

ConclusionsThe present experiment focused on the distribution of citations in a closed butheterogeneous environment of thematically related Web spaces using two setsof data We studied how they regt on the one hand a power law distribution andon the other Bradfordrsquos law looking at a great many variants with a total of 384plots

Studies in the literature have shown that on a very large scale sitationsfollow a power law distribution with an exponent of 21 The present sitationdistributions from both data sets were coherent with those earlier studiesincluding in the exponent independently of whether or not the counts includedbroken links total links or only sitor pages and pages or Web spaces both asorigin and as target We showed that the distributions differed howeveraccording to which calculation procedure was used and to what reglter wasapplied to the data In particular when we considered only Web spaces asorigins instead of pages or links and pages as destinations the slope increasedto correspond approximately to an exponent of 28 When however bothdestinations and origins were Web spaces the exponent returned to the value21 Also when the destinations were thematically restricted (in which casethey practically coincided with the citors) the slope was reduced to correspondapproximately to an exponent of 15 Finally the tail in the data was reduced byeliminating the targets that were very frequently linked to in very large Webspaces

We studied the regt to Bradfordrsquos law by plotting the accumulated sitationsagainst the accumulated targets Although we presented many distributions

ordfSitationordmdistributions

577

none of them regtted the typical Bradford case When we considered Web spacesas the sitors the resulting distribution was concave from above with no pointof inmacrection and passing through the origin The other distributions in whichpages were considered as the sitors could in general be characterized as beingcurves which start noticeably above the origin and have several points ofinmacrection ( ) It was also notable regrstly that taking Web spaces asdestinations instead of pages greatly reduced the accumulated sitations andmuch of the complexity of the curve and secondly that eliminating the targetsthat were very frequently linked to in very large Web spaces sharply reducedthe accumulated sitations and the curves changed now to pass through theorigin and presented an exponential dependence on the accumulated sitations

In sum everything seems to support the observation of Kim (2000) that thereare different motivations behind the citations in scientiregc articles and the linksof the World Wide Web In this sense we conregrmed the immense inmacruence ofthematic restriction (for instance nearly all the Web spaces included linksrelated to the World Wide Web itself and its technology) and of the repetitivelinks found in most of the very large Web spaces (which are usually generatedby means of software tools that allow links to be automatically included onevery page)

References

Aguillo IF (2000) ordfIndicadores hacia una evaluacioAcircn no objetiva (cuantitativa) de sedes webordm inVII Jornadas EspanAuml olas de DocumentacioAcircn (Bilbao 19-21 de octubre de 2000) ServicioEditorial de la Universidad del PaotildeAcircs Vasco Bilbao pp 233-48

Almind TC and Ingwersen P (1997) ordfInformetric analyses on the World Wide Webmethodological approaches to `Webometricsrsquoordm Journal of Documentation Vol 53 No 4pp 404-26

Bar-Ilan J (1997) ordfThe `mad cow diseasersquo Usenet newsgroups and bibliometric lawsordmScientometrics Vol 39 No 1 pp 29-55

Bar-Ilan J (2001) ordfData collection methods on the Web for informetrics purposes a review andanalysisordm Scientometrics Vol 50 No 1 pp 7-32

BarabaAcircsi AL and Albert R (1999) ordfEmergence of scaling in random networksordm ScienceVol 286 pp 509-12

BjoEgraverneborn L and Ingwersen P (2001) ordfPerspectives of webometricsordm Scientometrics Vol 50No 1 pp 65-82

Bradford SC (1934) ordfSourcesof information of speciregc subjectsordm Engineering Vol 137 pp 85-6

Brin S and Page L (1998) ordfThe anatomy of a large-scale hypertextual web search engineordmComputer Networks and ISDN Systems Vol 30 pp 107-17

Broder A et al (2000) ordfGraph structure in the Webordm Computer Networks and ISDN SystemsVol 33 No 1-6 pp 309-20

BrookesBC (1969) ordfBradfordrsquos law and the bibliography of scienceordm Nature Vol 224 pp 953-6

Codina L (2000) ordfParaAcircmetros e indicadores de calidad para la evaluacioAcircn de recursos digitalesordmin VII Jornadas EspanAuml olas de DocumentacioAcircn (Bilbao 19-21 de octubre de 2000) ServicioEditorial de la Universidad del PaotildeAcircs Vasco Bilbao pp 135-44

JD595

578

Correa-Uribe G (1999) Colombia Conectada al Mundo Sitios Web Colombianos Universidad deAntioquotildeAcirca AntioquotildeAcirca

Cronin B (2001) ordfBibliometrics and beyond some thoughts on web-based citation analysisordmJournal of Information Science Vol 27 No 1 pp 1-7

Cui L (1999) ordfRating health web sites using the principles of citation analysis a bibliometricapproachordm Journal of Medical Internet Research Vol 1 No e4 avalaible atwwwjmirorg19991e4indexhtm (accessed 18 August 2001)

Cybermetrics International Journal of Scientometrics Informetrics and Bibliometrics (2002)available at wwwcindoccsicescybermetrics (accessed 11 August 2002)

Egghe L (2000) ordfNew informetric aspects of the Internet some remacrections plusmn many problemsordmJournal of Information Science Vol 26 No 5 pp 329-35

Ferreiro-AlaAcircez L (1981) ordfAnaAcirclisis de referencias y caracterotildeAcircsticas bibliomeAcirctricas de losconjuntos de revistas nuclearesordm Revista EspanAuml ola de DocumentacioAcircn CientotildeAcircregca Vol 4No 3 pp 181-98

Gorbea-Portal S (1996) El Modelo MatemaAcirctico de Bradford Su AplicacioAcircn a las RevistasLatinoamericanas de las Ciencias BibliotecoloAcircgicas y de la InformacioAcircn UNAM CentroUniversitario de Investigaciones BibliotecoloAcircgicas MeAcircxico

Gupta DK (1991) ordfApplication of Bradfordrsquos law to citation data of Ethiopian medicaljournalsordm Annals of Library Science and Documentation Vol 38 No 3 pp 85-98

Harter SP and Ford CE (2000) ordfWeb-based analyses of e-journal impact approachesproblems and issuesordm Library Science With a Slant to Documentation and InformationStudies Vol 51 No 13 pp 1159-76

Houston W (1983) ordfThe application of bibliometrics to veterinary science primary literatureordmQuarterly Bulletin of International Association of Agricultural Information SpecialistsVol 28 No 1 pp 6-13

Internet Software Consortium (2002) ordfInternet domain surveyordm available at wwwiscorg(accessed 4 November 2002)

JimeAcircnez-Piano M (2001) ordfEvaluacioAcircn de sedes webordm Revista EspanAuml ola de DocumentacioAcircnCientotildeAcircregca Vol 24 No 4 pp 405-32

Kim HJ (2000) ordfMotivations for hyperlinking in scholarly electronic articles a qualitativestudyordm Journal of the American Society for Information Science Vol 51 No 10 pp 887-99

Kumar R et al (2001) ordfTrawling the Web for emerging cyber-communitiesordm in Mendelzon A etal (Eds) Proceedings of the 8th International World Wide Web Conference (TorontoCanadaAcirc May 11-14 1999) available at www8orgw8-papers4a-search-miningtrawlingtrawlinghtml (accessed 2 October 2002) Elsevier Amsterdam

Lal A and Panda KC (1999) ordfBradfordrsquos law and its application to bibliographical data ofplant pathology dissertations an analytical approachordm Library Science With a Slant toDocumentation and Information Studies Vol 36 No 3 pp 193-206

Larson RR (1996) ordfBibliometrics of the World Wide Web and exploratory analysis of theintellectual structure of cyberspaceordm in Hardin S (Ed) Proceedings of the 59th AnnualMeeting of the American Society for Information Science (Baltimore Maryland 1996)Information Today Medford NJ pp 71-8 available at httpsherlockberkeleyeduasis96asis96html (accessed 14 October 2000)

McKiernan G (1996) ordfCitedSites(sm) citation indexing of Web resourcesordm available atwwwpubliciastateedu CYBERSTACKSCitedhtm (accessed 24 February 2000)

Mubeen MA (1996) ordfCitation analysis of doctoral dissertations in chemistryordm Annals of LibraryScience and Documentation Vol 43 No 2 pp 48-58

ordfSitationordmdistributions

579

Price DJ (1970) ordfCitation measures of hard science soft science technology and non-scienceordm inNelson CC and Pollock DE (Eds) Communication Among Scientists and Engineers DCHealth and Co Lexington MA pp 3-22

Reyes-BarragaAcircn MJ et al (2000) ordfRevistas cientotildeAcircregcas determinacioAcircn de necesidades y usosordmRevista EspanAuml ola de DocumentacioAcircn CientotildeAcircregca Vol 23 No 4 pp 417-36

RodrotildeAcircguez i GairotildeAcircn JM (1997) ordfValoracioAcircn del impacto de la informacioAcircn en Internet Altavistael `Citation Indexrsquo de la redordm Revista EspanAuml ola de DocumentacioAcircn CientotildeAcircregca Vol 20 No 2pp 175-81

Rousseau R (1997) ordfCitations an exploratory studyordm Cybermetrics International Journal ofScientometrics Informetrics and Bibliometrics Vol 1 p 1 available atwwwcindoccsicescybermetricsarticlesv1i1p1html (accessed 5 September 2000)

Smith AG (1996) ordfCriteria for evaluation of Internet information resourcesordm available atwwwvuwacnz agsmithevalnindexhtm (accessed 25 March 2000)

Smith AG (1997) ordfTesting the surf criteria for evaluating Internet information resourcesordm ThePublic-Access Computer Systems Review Vol 8 No 3 available at httpinfolibuheduprv8n3smit8n3html (accessed 28 February 2002)

Smith AG (1999) ordfThe impact of web sites a comparison betwen Australasia and LatinAmericaordm available at wwwvuwacnz agsmithpublnsaustlat (accessed 12 May2001)

Smith AG (2001) ordfApplying evaluation criteria to New Zealand government websitesordmInternational Journal of Information Management Vol 21 No 2 pp 137-49

Tague-Sutcliffe J (1992) ordfAn introduction to informetricsordm Information Processing andManagement Vol 28 No 1 pp 1-3

Tillman HN (2000) ordfEvaluating quality on the Netordm available atwwwtiacnetusershoperegndqualhtml (accessed 5 June 2000)

van Raan AFJ (2001) ordfBibliometrics and Internet some observations and expectationsordmScientometrics Vol 50 No 1 pp 59-63

Vreeland RC (2000) ordfLaw libraries in hyperspacea citation analysis of World Wide Web sitesordmLaw Library Journal Vol 92 No 1 pp 9-25

Zhang P and von Dran GM (2000) ordfSatisregers and dissatisregers a two-factor model for websitedesign and evaluationordm Journal of the American Society for Information Science Vol 51No 14 pp 1253-68

JD595

580

Figure 4Fraction of Web spaces

of the second set thatreceive a given number

of inlinks both includingand excluding the group

of targets that werelinked to in a large

percentage of the pagesof some very large Web

spaces

ordfSitationordmdistributions

569

Figure 5Fraction of Web spacesthat receive a givennumber of inlinkscomparing the regrst setwith the secondincluding only targetsdealing withExtremadura andexcluding targets thatwere very frequentlylinked to in very largeWeb spaces

JD595

570

being Web spaces which deal with Extremadura We compare the data fromthe regrst set with the data from the second One observes the logical decrease inthe number of elements but also a decrease in the slope that regts best anexponent of 15

As we mentioned above Bradfordrsquos law distributions are traditionallyrepresented differently from power law distributions so the presentation ofFigures 6-10 will be slightly different from the previous regve with accumulatedsitations plotted against sitors using a logarithmic scale for the latter sincetheir number increases exponentially with the accumulated sitationsBradfordrsquos law is of course normally applied to scientiregc journals countingthe citations received by each journal These citations point from the citingarticles (which in turn belong to journals) towards the journalrsquos articles Acertain ambiguity arises in the Web analogue however When a certain work iscited several times in an article it is only counted once Hence in our case toowhen a certain target is linked to several times it should also only be countedonce The problem is though that it is not clear that the Web spaces and pagesof sitation analysis should be taken as the analogues of journals and articlesrespectively or rather that an entire Web space should be taken as theanalogue of an article We shall look at both possibilities

Figure 6 shows the results for the second set when we count only onesitation per Web space for each target We present the results for target pagesindependently after grouping the pages by Web space and after alsoeliminating the frequent links as we did in the power law case This thereforeregards the analogues of the citor articles to be Web spaces the analogues ofthe cited journals to be pages or entire Web spaces (depending on the case) Thedata do not regt Bradfordrsquos law since the curve is concave from above Hence thenumber of sitations appears to grow exponentially and indeed if a logarithmicscale is also used for the vertical axis the result is practically a straight lineLikewise the verbal statement of the law is not satisreged since the proportion1nn2 does not hold

Figure 7 shows the same three distributions for the regrst set but nowcounting sitor pages This means that we are now considering Web pages to bethe analogues of citor articles which seems more natural One sees that the regrstdistribution corresponding to pages lies in the upper part of the plot at regrstrunning coincident with the Web space distribution and then rising above it

The second distribution grouped by Web space includes points ofinmacrection but in synthesis differs from a Bradford law curve in the followingaspects it begins at a point clearly above the origin the linear portion does notbegin at the regrst point of inmacrection but later and it ends with a stronglyincreasing section

The third distribution occupies the lower part of the plot This means thateliminating the links found in a large percentage of the pages of very large Webspaces eliminates many sitations but not many Web spaces or linked pages

ordfSitationordmdistributions

571

Figure 6Accumulated sitations vsaccumulated Webspaces The data arefrom the second set Theresults are presented fortarget pagesindependently groupedby Web space andexcluding targets thatwere very frequentlylinked to in very largeWeb spaces Only sitorWeb spaces are counted

JD595

572

Figure 7Accumulated sitations vs

accumulated Webspaces The data are

from the regrst set Theresults are presented for

target pagesindependently grouped

by Web space andexcluding targets thatwere very frequently

linked to in very largeWeb spaces Sitor pages

are counted

ordfSitationordmdistributions

573

Figure 8Accumulated sitations vsaccumulated Webspaces The data arefrom the regrst set Theresults are presented fortarget pagesindependently groupedby Web space andexcluding targets thatwere very frequentlylinked to in very largeWeb spaces Sitor pagesare counted but onlytargets aboutExtremadura

JD595

574

Figure 9Accumulated sitations vs

accumulated Webspaces The data are

from the second set Theresults are presented for

target pagesindependently grouped

by Web space andexcluding targets thatwere very frequently

linked to in very largeWeb spaces Sitor pages

are counted

ordfSitationordmdistributions

575

Figure 10Accumulated sitations vsaccumulated Webspaces The data arefrom the second set Theresults are presented fortarget pagesindependently groupedby Web space andexcluding targets thatwere very frequentlylinked to in very largeWeb spaces Sitor pagesare counted but onlytargets aboutExtremadura

JD595

576

Although as this distribution is plotted together with the other distributionsits slope is less clearly perceptible one nevertheless observes that the slopeincreases continuously implying an exponential growth of the accumulatedsitations also Indeed if a logarithmic scale is also used on the vertical axis theresult is a straight line with no point of inmacrection and not regtting Bradfordrsquoslaw It is interesting that with the elimination of this set of links the dependenceof the sitations becomes exponential Again the verbal statement of the law isnot satisreged since the proportion 1nn2 does not hold

Some authors Brookes (1969) for instance consider that Bradfordrsquos law ismainly applicable in small and well-deregned areas of research in which theremust exist a strong thematic relationship between the documents In Figure 8we have therefore represented only the pages or Web spaces that are fromExtremadura (for the regrst set of data) The behaviour however is quite similarto the previous case the regrst two distributions are fairly similar to each otherwith the greatest difference being in the third distribution which is a straightline

Figures 9 and 10 show the same calculations as Figures 7 and 8 but now fordata from the second set The results are identical

ConclusionsThe present experiment focused on the distribution of citations in a closed butheterogeneous environment of thematically related Web spaces using two setsof data We studied how they regt on the one hand a power law distribution andon the other Bradfordrsquos law looking at a great many variants with a total of 384plots

Studies in the literature have shown that on a very large scale sitationsfollow a power law distribution with an exponent of 21 The present sitationdistributions from both data sets were coherent with those earlier studiesincluding in the exponent independently of whether or not the counts includedbroken links total links or only sitor pages and pages or Web spaces both asorigin and as target We showed that the distributions differed howeveraccording to which calculation procedure was used and to what reglter wasapplied to the data In particular when we considered only Web spaces asorigins instead of pages or links and pages as destinations the slope increasedto correspond approximately to an exponent of 28 When however bothdestinations and origins were Web spaces the exponent returned to the value21 Also when the destinations were thematically restricted (in which casethey practically coincided with the citors) the slope was reduced to correspondapproximately to an exponent of 15 Finally the tail in the data was reduced byeliminating the targets that were very frequently linked to in very large Webspaces

We studied the regt to Bradfordrsquos law by plotting the accumulated sitationsagainst the accumulated targets Although we presented many distributions

ordfSitationordmdistributions

577

none of them regtted the typical Bradford case When we considered Web spacesas the sitors the resulting distribution was concave from above with no pointof inmacrection and passing through the origin The other distributions in whichpages were considered as the sitors could in general be characterized as beingcurves which start noticeably above the origin and have several points ofinmacrection ( ) It was also notable regrstly that taking Web spaces asdestinations instead of pages greatly reduced the accumulated sitations andmuch of the complexity of the curve and secondly that eliminating the targetsthat were very frequently linked to in very large Web spaces sharply reducedthe accumulated sitations and the curves changed now to pass through theorigin and presented an exponential dependence on the accumulated sitations

In sum everything seems to support the observation of Kim (2000) that thereare different motivations behind the citations in scientiregc articles and the linksof the World Wide Web In this sense we conregrmed the immense inmacruence ofthematic restriction (for instance nearly all the Web spaces included linksrelated to the World Wide Web itself and its technology) and of the repetitivelinks found in most of the very large Web spaces (which are usually generatedby means of software tools that allow links to be automatically included onevery page)

References

Aguillo IF (2000) ordfIndicadores hacia una evaluacioAcircn no objetiva (cuantitativa) de sedes webordm inVII Jornadas EspanAuml olas de DocumentacioAcircn (Bilbao 19-21 de octubre de 2000) ServicioEditorial de la Universidad del PaotildeAcircs Vasco Bilbao pp 233-48

Almind TC and Ingwersen P (1997) ordfInformetric analyses on the World Wide Webmethodological approaches to `Webometricsrsquoordm Journal of Documentation Vol 53 No 4pp 404-26

Bar-Ilan J (1997) ordfThe `mad cow diseasersquo Usenet newsgroups and bibliometric lawsordmScientometrics Vol 39 No 1 pp 29-55

Bar-Ilan J (2001) ordfData collection methods on the Web for informetrics purposes a review andanalysisordm Scientometrics Vol 50 No 1 pp 7-32

BarabaAcircsi AL and Albert R (1999) ordfEmergence of scaling in random networksordm ScienceVol 286 pp 509-12

BjoEgraverneborn L and Ingwersen P (2001) ordfPerspectives of webometricsordm Scientometrics Vol 50No 1 pp 65-82

Bradford SC (1934) ordfSourcesof information of speciregc subjectsordm Engineering Vol 137 pp 85-6

Brin S and Page L (1998) ordfThe anatomy of a large-scale hypertextual web search engineordmComputer Networks and ISDN Systems Vol 30 pp 107-17

Broder A et al (2000) ordfGraph structure in the Webordm Computer Networks and ISDN SystemsVol 33 No 1-6 pp 309-20

BrookesBC (1969) ordfBradfordrsquos law and the bibliography of scienceordm Nature Vol 224 pp 953-6

Codina L (2000) ordfParaAcircmetros e indicadores de calidad para la evaluacioAcircn de recursos digitalesordmin VII Jornadas EspanAuml olas de DocumentacioAcircn (Bilbao 19-21 de octubre de 2000) ServicioEditorial de la Universidad del PaotildeAcircs Vasco Bilbao pp 135-44

JD595

578

Correa-Uribe G (1999) Colombia Conectada al Mundo Sitios Web Colombianos Universidad deAntioquotildeAcirca AntioquotildeAcirca

Cronin B (2001) ordfBibliometrics and beyond some thoughts on web-based citation analysisordmJournal of Information Science Vol 27 No 1 pp 1-7

Cui L (1999) ordfRating health web sites using the principles of citation analysis a bibliometricapproachordm Journal of Medical Internet Research Vol 1 No e4 avalaible atwwwjmirorg19991e4indexhtm (accessed 18 August 2001)

Cybermetrics International Journal of Scientometrics Informetrics and Bibliometrics (2002)available at wwwcindoccsicescybermetrics (accessed 11 August 2002)

Egghe L (2000) ordfNew informetric aspects of the Internet some remacrections plusmn many problemsordmJournal of Information Science Vol 26 No 5 pp 329-35

Ferreiro-AlaAcircez L (1981) ordfAnaAcirclisis de referencias y caracterotildeAcircsticas bibliomeAcirctricas de losconjuntos de revistas nuclearesordm Revista EspanAuml ola de DocumentacioAcircn CientotildeAcircregca Vol 4No 3 pp 181-98

Gorbea-Portal S (1996) El Modelo MatemaAcirctico de Bradford Su AplicacioAcircn a las RevistasLatinoamericanas de las Ciencias BibliotecoloAcircgicas y de la InformacioAcircn UNAM CentroUniversitario de Investigaciones BibliotecoloAcircgicas MeAcircxico

Gupta DK (1991) ordfApplication of Bradfordrsquos law to citation data of Ethiopian medicaljournalsordm Annals of Library Science and Documentation Vol 38 No 3 pp 85-98

Harter SP and Ford CE (2000) ordfWeb-based analyses of e-journal impact approachesproblems and issuesordm Library Science With a Slant to Documentation and InformationStudies Vol 51 No 13 pp 1159-76

Houston W (1983) ordfThe application of bibliometrics to veterinary science primary literatureordmQuarterly Bulletin of International Association of Agricultural Information SpecialistsVol 28 No 1 pp 6-13

Internet Software Consortium (2002) ordfInternet domain surveyordm available at wwwiscorg(accessed 4 November 2002)

JimeAcircnez-Piano M (2001) ordfEvaluacioAcircn de sedes webordm Revista EspanAuml ola de DocumentacioAcircnCientotildeAcircregca Vol 24 No 4 pp 405-32

Kim HJ (2000) ordfMotivations for hyperlinking in scholarly electronic articles a qualitativestudyordm Journal of the American Society for Information Science Vol 51 No 10 pp 887-99

Kumar R et al (2001) ordfTrawling the Web for emerging cyber-communitiesordm in Mendelzon A etal (Eds) Proceedings of the 8th International World Wide Web Conference (TorontoCanadaAcirc May 11-14 1999) available at www8orgw8-papers4a-search-miningtrawlingtrawlinghtml (accessed 2 October 2002) Elsevier Amsterdam

Lal A and Panda KC (1999) ordfBradfordrsquos law and its application to bibliographical data ofplant pathology dissertations an analytical approachordm Library Science With a Slant toDocumentation and Information Studies Vol 36 No 3 pp 193-206

Larson RR (1996) ordfBibliometrics of the World Wide Web and exploratory analysis of theintellectual structure of cyberspaceordm in Hardin S (Ed) Proceedings of the 59th AnnualMeeting of the American Society for Information Science (Baltimore Maryland 1996)Information Today Medford NJ pp 71-8 available at httpsherlockberkeleyeduasis96asis96html (accessed 14 October 2000)

McKiernan G (1996) ordfCitedSites(sm) citation indexing of Web resourcesordm available atwwwpubliciastateedu CYBERSTACKSCitedhtm (accessed 24 February 2000)

Mubeen MA (1996) ordfCitation analysis of doctoral dissertations in chemistryordm Annals of LibraryScience and Documentation Vol 43 No 2 pp 48-58

ordfSitationordmdistributions

579

Price DJ (1970) ordfCitation measures of hard science soft science technology and non-scienceordm inNelson CC and Pollock DE (Eds) Communication Among Scientists and Engineers DCHealth and Co Lexington MA pp 3-22

Reyes-BarragaAcircn MJ et al (2000) ordfRevistas cientotildeAcircregcas determinacioAcircn de necesidades y usosordmRevista EspanAuml ola de DocumentacioAcircn CientotildeAcircregca Vol 23 No 4 pp 417-36

RodrotildeAcircguez i GairotildeAcircn JM (1997) ordfValoracioAcircn del impacto de la informacioAcircn en Internet Altavistael `Citation Indexrsquo de la redordm Revista EspanAuml ola de DocumentacioAcircn CientotildeAcircregca Vol 20 No 2pp 175-81

Rousseau R (1997) ordfCitations an exploratory studyordm Cybermetrics International Journal ofScientometrics Informetrics and Bibliometrics Vol 1 p 1 available atwwwcindoccsicescybermetricsarticlesv1i1p1html (accessed 5 September 2000)

Smith AG (1996) ordfCriteria for evaluation of Internet information resourcesordm available atwwwvuwacnz agsmithevalnindexhtm (accessed 25 March 2000)

Smith AG (1997) ordfTesting the surf criteria for evaluating Internet information resourcesordm ThePublic-Access Computer Systems Review Vol 8 No 3 available at httpinfolibuheduprv8n3smit8n3html (accessed 28 February 2002)

Smith AG (1999) ordfThe impact of web sites a comparison betwen Australasia and LatinAmericaordm available at wwwvuwacnz agsmithpublnsaustlat (accessed 12 May2001)

Smith AG (2001) ordfApplying evaluation criteria to New Zealand government websitesordmInternational Journal of Information Management Vol 21 No 2 pp 137-49

Tague-Sutcliffe J (1992) ordfAn introduction to informetricsordm Information Processing andManagement Vol 28 No 1 pp 1-3

Tillman HN (2000) ordfEvaluating quality on the Netordm available atwwwtiacnetusershoperegndqualhtml (accessed 5 June 2000)

van Raan AFJ (2001) ordfBibliometrics and Internet some observations and expectationsordmScientometrics Vol 50 No 1 pp 59-63

Vreeland RC (2000) ordfLaw libraries in hyperspacea citation analysis of World Wide Web sitesordmLaw Library Journal Vol 92 No 1 pp 9-25

Zhang P and von Dran GM (2000) ordfSatisregers and dissatisregers a two-factor model for websitedesign and evaluationordm Journal of the American Society for Information Science Vol 51No 14 pp 1253-68

JD595

580

Figure 5Fraction of Web spacesthat receive a givennumber of inlinkscomparing the regrst setwith the secondincluding only targetsdealing withExtremadura andexcluding targets thatwere very frequentlylinked to in very largeWeb spaces

JD595

570

being Web spaces which deal with Extremadura We compare the data fromthe regrst set with the data from the second One observes the logical decrease inthe number of elements but also a decrease in the slope that regts best anexponent of 15

As we mentioned above Bradfordrsquos law distributions are traditionallyrepresented differently from power law distributions so the presentation ofFigures 6-10 will be slightly different from the previous regve with accumulatedsitations plotted against sitors using a logarithmic scale for the latter sincetheir number increases exponentially with the accumulated sitationsBradfordrsquos law is of course normally applied to scientiregc journals countingthe citations received by each journal These citations point from the citingarticles (which in turn belong to journals) towards the journalrsquos articles Acertain ambiguity arises in the Web analogue however When a certain work iscited several times in an article it is only counted once Hence in our case toowhen a certain target is linked to several times it should also only be countedonce The problem is though that it is not clear that the Web spaces and pagesof sitation analysis should be taken as the analogues of journals and articlesrespectively or rather that an entire Web space should be taken as theanalogue of an article We shall look at both possibilities

Figure 6 shows the results for the second set when we count only onesitation per Web space for each target We present the results for target pagesindependently after grouping the pages by Web space and after alsoeliminating the frequent links as we did in the power law case This thereforeregards the analogues of the citor articles to be Web spaces the analogues ofthe cited journals to be pages or entire Web spaces (depending on the case) Thedata do not regt Bradfordrsquos law since the curve is concave from above Hence thenumber of sitations appears to grow exponentially and indeed if a logarithmicscale is also used for the vertical axis the result is practically a straight lineLikewise the verbal statement of the law is not satisreged since the proportion1nn2 does not hold

Figure 7 shows the same three distributions for the regrst set but nowcounting sitor pages This means that we are now considering Web pages to bethe analogues of citor articles which seems more natural One sees that the regrstdistribution corresponding to pages lies in the upper part of the plot at regrstrunning coincident with the Web space distribution and then rising above it

The second distribution grouped by Web space includes points ofinmacrection but in synthesis differs from a Bradford law curve in the followingaspects it begins at a point clearly above the origin the linear portion does notbegin at the regrst point of inmacrection but later and it ends with a stronglyincreasing section

The third distribution occupies the lower part of the plot This means thateliminating the links found in a large percentage of the pages of very large Webspaces eliminates many sitations but not many Web spaces or linked pages

ordfSitationordmdistributions

571

Figure 6Accumulated sitations vsaccumulated Webspaces The data arefrom the second set Theresults are presented fortarget pagesindependently groupedby Web space andexcluding targets thatwere very frequentlylinked to in very largeWeb spaces Only sitorWeb spaces are counted

JD595

572

Figure 7Accumulated sitations vs

accumulated Webspaces The data are

from the regrst set Theresults are presented for

target pagesindependently grouped

by Web space andexcluding targets thatwere very frequently

linked to in very largeWeb spaces Sitor pages

are counted

ordfSitationordmdistributions

573

Figure 8Accumulated sitations vsaccumulated Webspaces The data arefrom the regrst set Theresults are presented fortarget pagesindependently groupedby Web space andexcluding targets thatwere very frequentlylinked to in very largeWeb spaces Sitor pagesare counted but onlytargets aboutExtremadura

JD595

574

Figure 9Accumulated sitations vs

accumulated Webspaces The data are

from the second set Theresults are presented for

target pagesindependently grouped

by Web space andexcluding targets thatwere very frequently

linked to in very largeWeb spaces Sitor pages

are counted

ordfSitationordmdistributions

575

Figure 10Accumulated sitations vsaccumulated Webspaces The data arefrom the second set Theresults are presented fortarget pagesindependently groupedby Web space andexcluding targets thatwere very frequentlylinked to in very largeWeb spaces Sitor pagesare counted but onlytargets aboutExtremadura

JD595

576

Although as this distribution is plotted together with the other distributionsits slope is less clearly perceptible one nevertheless observes that the slopeincreases continuously implying an exponential growth of the accumulatedsitations also Indeed if a logarithmic scale is also used on the vertical axis theresult is a straight line with no point of inmacrection and not regtting Bradfordrsquoslaw It is interesting that with the elimination of this set of links the dependenceof the sitations becomes exponential Again the verbal statement of the law isnot satisreged since the proportion 1nn2 does not hold

Some authors Brookes (1969) for instance consider that Bradfordrsquos law ismainly applicable in small and well-deregned areas of research in which theremust exist a strong thematic relationship between the documents In Figure 8we have therefore represented only the pages or Web spaces that are fromExtremadura (for the regrst set of data) The behaviour however is quite similarto the previous case the regrst two distributions are fairly similar to each otherwith the greatest difference being in the third distribution which is a straightline

Figures 9 and 10 show the same calculations as Figures 7 and 8 but now fordata from the second set The results are identical

ConclusionsThe present experiment focused on the distribution of citations in a closed butheterogeneous environment of thematically related Web spaces using two setsof data We studied how they regt on the one hand a power law distribution andon the other Bradfordrsquos law looking at a great many variants with a total of 384plots

Studies in the literature have shown that on a very large scale sitationsfollow a power law distribution with an exponent of 21 The present sitationdistributions from both data sets were coherent with those earlier studiesincluding in the exponent independently of whether or not the counts includedbroken links total links or only sitor pages and pages or Web spaces both asorigin and as target We showed that the distributions differed howeveraccording to which calculation procedure was used and to what reglter wasapplied to the data In particular when we considered only Web spaces asorigins instead of pages or links and pages as destinations the slope increasedto correspond approximately to an exponent of 28 When however bothdestinations and origins were Web spaces the exponent returned to the value21 Also when the destinations were thematically restricted (in which casethey practically coincided with the citors) the slope was reduced to correspondapproximately to an exponent of 15 Finally the tail in the data was reduced byeliminating the targets that were very frequently linked to in very large Webspaces

We studied the regt to Bradfordrsquos law by plotting the accumulated sitationsagainst the accumulated targets Although we presented many distributions

ordfSitationordmdistributions

577

none of them regtted the typical Bradford case When we considered Web spacesas the sitors the resulting distribution was concave from above with no pointof inmacrection and passing through the origin The other distributions in whichpages were considered as the sitors could in general be characterized as beingcurves which start noticeably above the origin and have several points ofinmacrection ( ) It was also notable regrstly that taking Web spaces asdestinations instead of pages greatly reduced the accumulated sitations andmuch of the complexity of the curve and secondly that eliminating the targetsthat were very frequently linked to in very large Web spaces sharply reducedthe accumulated sitations and the curves changed now to pass through theorigin and presented an exponential dependence on the accumulated sitations

In sum everything seems to support the observation of Kim (2000) that thereare different motivations behind the citations in scientiregc articles and the linksof the World Wide Web In this sense we conregrmed the immense inmacruence ofthematic restriction (for instance nearly all the Web spaces included linksrelated to the World Wide Web itself and its technology) and of the repetitivelinks found in most of the very large Web spaces (which are usually generatedby means of software tools that allow links to be automatically included onevery page)

References

Aguillo IF (2000) ordfIndicadores hacia una evaluacioAcircn no objetiva (cuantitativa) de sedes webordm inVII Jornadas EspanAuml olas de DocumentacioAcircn (Bilbao 19-21 de octubre de 2000) ServicioEditorial de la Universidad del PaotildeAcircs Vasco Bilbao pp 233-48

Almind TC and Ingwersen P (1997) ordfInformetric analyses on the World Wide Webmethodological approaches to `Webometricsrsquoordm Journal of Documentation Vol 53 No 4pp 404-26

Bar-Ilan J (1997) ordfThe `mad cow diseasersquo Usenet newsgroups and bibliometric lawsordmScientometrics Vol 39 No 1 pp 29-55

Bar-Ilan J (2001) ordfData collection methods on the Web for informetrics purposes a review andanalysisordm Scientometrics Vol 50 No 1 pp 7-32

BarabaAcircsi AL and Albert R (1999) ordfEmergence of scaling in random networksordm ScienceVol 286 pp 509-12

BjoEgraverneborn L and Ingwersen P (2001) ordfPerspectives of webometricsordm Scientometrics Vol 50No 1 pp 65-82

Bradford SC (1934) ordfSourcesof information of speciregc subjectsordm Engineering Vol 137 pp 85-6

Brin S and Page L (1998) ordfThe anatomy of a large-scale hypertextual web search engineordmComputer Networks and ISDN Systems Vol 30 pp 107-17

Broder A et al (2000) ordfGraph structure in the Webordm Computer Networks and ISDN SystemsVol 33 No 1-6 pp 309-20

BrookesBC (1969) ordfBradfordrsquos law and the bibliography of scienceordm Nature Vol 224 pp 953-6

Codina L (2000) ordfParaAcircmetros e indicadores de calidad para la evaluacioAcircn de recursos digitalesordmin VII Jornadas EspanAuml olas de DocumentacioAcircn (Bilbao 19-21 de octubre de 2000) ServicioEditorial de la Universidad del PaotildeAcircs Vasco Bilbao pp 135-44

JD595

578

Correa-Uribe G (1999) Colombia Conectada al Mundo Sitios Web Colombianos Universidad deAntioquotildeAcirca AntioquotildeAcirca

Cronin B (2001) ordfBibliometrics and beyond some thoughts on web-based citation analysisordmJournal of Information Science Vol 27 No 1 pp 1-7

Cui L (1999) ordfRating health web sites using the principles of citation analysis a bibliometricapproachordm Journal of Medical Internet Research Vol 1 No e4 avalaible atwwwjmirorg19991e4indexhtm (accessed 18 August 2001)

Cybermetrics International Journal of Scientometrics Informetrics and Bibliometrics (2002)available at wwwcindoccsicescybermetrics (accessed 11 August 2002)

Egghe L (2000) ordfNew informetric aspects of the Internet some remacrections plusmn many problemsordmJournal of Information Science Vol 26 No 5 pp 329-35

Ferreiro-AlaAcircez L (1981) ordfAnaAcirclisis de referencias y caracterotildeAcircsticas bibliomeAcirctricas de losconjuntos de revistas nuclearesordm Revista EspanAuml ola de DocumentacioAcircn CientotildeAcircregca Vol 4No 3 pp 181-98

Gorbea-Portal S (1996) El Modelo MatemaAcirctico de Bradford Su AplicacioAcircn a las RevistasLatinoamericanas de las Ciencias BibliotecoloAcircgicas y de la InformacioAcircn UNAM CentroUniversitario de Investigaciones BibliotecoloAcircgicas MeAcircxico

Gupta DK (1991) ordfApplication of Bradfordrsquos law to citation data of Ethiopian medicaljournalsordm Annals of Library Science and Documentation Vol 38 No 3 pp 85-98

Harter SP and Ford CE (2000) ordfWeb-based analyses of e-journal impact approachesproblems and issuesordm Library Science With a Slant to Documentation and InformationStudies Vol 51 No 13 pp 1159-76

Houston W (1983) ordfThe application of bibliometrics to veterinary science primary literatureordmQuarterly Bulletin of International Association of Agricultural Information SpecialistsVol 28 No 1 pp 6-13

Internet Software Consortium (2002) ordfInternet domain surveyordm available at wwwiscorg(accessed 4 November 2002)

JimeAcircnez-Piano M (2001) ordfEvaluacioAcircn de sedes webordm Revista EspanAuml ola de DocumentacioAcircnCientotildeAcircregca Vol 24 No 4 pp 405-32

Kim HJ (2000) ordfMotivations for hyperlinking in scholarly electronic articles a qualitativestudyordm Journal of the American Society for Information Science Vol 51 No 10 pp 887-99

Kumar R et al (2001) ordfTrawling the Web for emerging cyber-communitiesordm in Mendelzon A etal (Eds) Proceedings of the 8th International World Wide Web Conference (TorontoCanadaAcirc May 11-14 1999) available at www8orgw8-papers4a-search-miningtrawlingtrawlinghtml (accessed 2 October 2002) Elsevier Amsterdam

Lal A and Panda KC (1999) ordfBradfordrsquos law and its application to bibliographical data ofplant pathology dissertations an analytical approachordm Library Science With a Slant toDocumentation and Information Studies Vol 36 No 3 pp 193-206

Larson RR (1996) ordfBibliometrics of the World Wide Web and exploratory analysis of theintellectual structure of cyberspaceordm in Hardin S (Ed) Proceedings of the 59th AnnualMeeting of the American Society for Information Science (Baltimore Maryland 1996)Information Today Medford NJ pp 71-8 available at httpsherlockberkeleyeduasis96asis96html (accessed 14 October 2000)

McKiernan G (1996) ordfCitedSites(sm) citation indexing of Web resourcesordm available atwwwpubliciastateedu CYBERSTACKSCitedhtm (accessed 24 February 2000)

Mubeen MA (1996) ordfCitation analysis of doctoral dissertations in chemistryordm Annals of LibraryScience and Documentation Vol 43 No 2 pp 48-58

ordfSitationordmdistributions

579

Price DJ (1970) ordfCitation measures of hard science soft science technology and non-scienceordm inNelson CC and Pollock DE (Eds) Communication Among Scientists and Engineers DCHealth and Co Lexington MA pp 3-22

Reyes-BarragaAcircn MJ et al (2000) ordfRevistas cientotildeAcircregcas determinacioAcircn de necesidades y usosordmRevista EspanAuml ola de DocumentacioAcircn CientotildeAcircregca Vol 23 No 4 pp 417-36

RodrotildeAcircguez i GairotildeAcircn JM (1997) ordfValoracioAcircn del impacto de la informacioAcircn en Internet Altavistael `Citation Indexrsquo de la redordm Revista EspanAuml ola de DocumentacioAcircn CientotildeAcircregca Vol 20 No 2pp 175-81

Rousseau R (1997) ordfCitations an exploratory studyordm Cybermetrics International Journal ofScientometrics Informetrics and Bibliometrics Vol 1 p 1 available atwwwcindoccsicescybermetricsarticlesv1i1p1html (accessed 5 September 2000)

Smith AG (1996) ordfCriteria for evaluation of Internet information resourcesordm available atwwwvuwacnz agsmithevalnindexhtm (accessed 25 March 2000)

Smith AG (1997) ordfTesting the surf criteria for evaluating Internet information resourcesordm ThePublic-Access Computer Systems Review Vol 8 No 3 available at httpinfolibuheduprv8n3smit8n3html (accessed 28 February 2002)

Smith AG (1999) ordfThe impact of web sites a comparison betwen Australasia and LatinAmericaordm available at wwwvuwacnz agsmithpublnsaustlat (accessed 12 May2001)

Smith AG (2001) ordfApplying evaluation criteria to New Zealand government websitesordmInternational Journal of Information Management Vol 21 No 2 pp 137-49

Tague-Sutcliffe J (1992) ordfAn introduction to informetricsordm Information Processing andManagement Vol 28 No 1 pp 1-3

Tillman HN (2000) ordfEvaluating quality on the Netordm available atwwwtiacnetusershoperegndqualhtml (accessed 5 June 2000)

van Raan AFJ (2001) ordfBibliometrics and Internet some observations and expectationsordmScientometrics Vol 50 No 1 pp 59-63

Vreeland RC (2000) ordfLaw libraries in hyperspacea citation analysis of World Wide Web sitesordmLaw Library Journal Vol 92 No 1 pp 9-25

Zhang P and von Dran GM (2000) ordfSatisregers and dissatisregers a two-factor model for websitedesign and evaluationordm Journal of the American Society for Information Science Vol 51No 14 pp 1253-68

JD595

580

being Web spaces which deal with Extremadura We compare the data fromthe regrst set with the data from the second One observes the logical decrease inthe number of elements but also a decrease in the slope that regts best anexponent of 15

As we mentioned above Bradfordrsquos law distributions are traditionallyrepresented differently from power law distributions so the presentation ofFigures 6-10 will be slightly different from the previous regve with accumulatedsitations plotted against sitors using a logarithmic scale for the latter sincetheir number increases exponentially with the accumulated sitationsBradfordrsquos law is of course normally applied to scientiregc journals countingthe citations received by each journal These citations point from the citingarticles (which in turn belong to journals) towards the journalrsquos articles Acertain ambiguity arises in the Web analogue however When a certain work iscited several times in an article it is only counted once Hence in our case toowhen a certain target is linked to several times it should also only be countedonce The problem is though that it is not clear that the Web spaces and pagesof sitation analysis should be taken as the analogues of journals and articlesrespectively or rather that an entire Web space should be taken as theanalogue of an article We shall look at both possibilities

Figure 6 shows the results for the second set when we count only onesitation per Web space for each target We present the results for target pagesindependently after grouping the pages by Web space and after alsoeliminating the frequent links as we did in the power law case This thereforeregards the analogues of the citor articles to be Web spaces the analogues ofthe cited journals to be pages or entire Web spaces (depending on the case) Thedata do not regt Bradfordrsquos law since the curve is concave from above Hence thenumber of sitations appears to grow exponentially and indeed if a logarithmicscale is also used for the vertical axis the result is practically a straight lineLikewise the verbal statement of the law is not satisreged since the proportion1nn2 does not hold

Figure 7 shows the same three distributions for the regrst set but nowcounting sitor pages This means that we are now considering Web pages to bethe analogues of citor articles which seems more natural One sees that the regrstdistribution corresponding to pages lies in the upper part of the plot at regrstrunning coincident with the Web space distribution and then rising above it

The second distribution grouped by Web space includes points ofinmacrection but in synthesis differs from a Bradford law curve in the followingaspects it begins at a point clearly above the origin the linear portion does notbegin at the regrst point of inmacrection but later and it ends with a stronglyincreasing section

The third distribution occupies the lower part of the plot This means thateliminating the links found in a large percentage of the pages of very large Webspaces eliminates many sitations but not many Web spaces or linked pages

ordfSitationordmdistributions

571

Figure 6Accumulated sitations vsaccumulated Webspaces The data arefrom the second set Theresults are presented fortarget pagesindependently groupedby Web space andexcluding targets thatwere very frequentlylinked to in very largeWeb spaces Only sitorWeb spaces are counted

JD595

572

Figure 7Accumulated sitations vs

accumulated Webspaces The data are

from the regrst set Theresults are presented for

target pagesindependently grouped

by Web space andexcluding targets thatwere very frequently

linked to in very largeWeb spaces Sitor pages

are counted

ordfSitationordmdistributions

573

Figure 8Accumulated sitations vsaccumulated Webspaces The data arefrom the regrst set Theresults are presented fortarget pagesindependently groupedby Web space andexcluding targets thatwere very frequentlylinked to in very largeWeb spaces Sitor pagesare counted but onlytargets aboutExtremadura

JD595

574

Figure 9Accumulated sitations vs

accumulated Webspaces The data are

from the second set Theresults are presented for

target pagesindependently grouped

by Web space andexcluding targets thatwere very frequently

linked to in very largeWeb spaces Sitor pages

are counted

ordfSitationordmdistributions

575

Figure 10Accumulated sitations vsaccumulated Webspaces The data arefrom the second set Theresults are presented fortarget pagesindependently groupedby Web space andexcluding targets thatwere very frequentlylinked to in very largeWeb spaces Sitor pagesare counted but onlytargets aboutExtremadura

JD595

576

Although as this distribution is plotted together with the other distributionsits slope is less clearly perceptible one nevertheless observes that the slopeincreases continuously implying an exponential growth of the accumulatedsitations also Indeed if a logarithmic scale is also used on the vertical axis theresult is a straight line with no point of inmacrection and not regtting Bradfordrsquoslaw It is interesting that with the elimination of this set of links the dependenceof the sitations becomes exponential Again the verbal statement of the law isnot satisreged since the proportion 1nn2 does not hold

Some authors Brookes (1969) for instance consider that Bradfordrsquos law ismainly applicable in small and well-deregned areas of research in which theremust exist a strong thematic relationship between the documents In Figure 8we have therefore represented only the pages or Web spaces that are fromExtremadura (for the regrst set of data) The behaviour however is quite similarto the previous case the regrst two distributions are fairly similar to each otherwith the greatest difference being in the third distribution which is a straightline

Figures 9 and 10 show the same calculations as Figures 7 and 8 but now fordata from the second set The results are identical

ConclusionsThe present experiment focused on the distribution of citations in a closed butheterogeneous environment of thematically related Web spaces using two setsof data We studied how they regt on the one hand a power law distribution andon the other Bradfordrsquos law looking at a great many variants with a total of 384plots

Studies in the literature have shown that on a very large scale sitationsfollow a power law distribution with an exponent of 21 The present sitationdistributions from both data sets were coherent with those earlier studiesincluding in the exponent independently of whether or not the counts includedbroken links total links or only sitor pages and pages or Web spaces both asorigin and as target We showed that the distributions differed howeveraccording to which calculation procedure was used and to what reglter wasapplied to the data In particular when we considered only Web spaces asorigins instead of pages or links and pages as destinations the slope increasedto correspond approximately to an exponent of 28 When however bothdestinations and origins were Web spaces the exponent returned to the value21 Also when the destinations were thematically restricted (in which casethey practically coincided with the citors) the slope was reduced to correspondapproximately to an exponent of 15 Finally the tail in the data was reduced byeliminating the targets that were very frequently linked to in very large Webspaces

We studied the regt to Bradfordrsquos law by plotting the accumulated sitationsagainst the accumulated targets Although we presented many distributions

ordfSitationordmdistributions

577

none of them regtted the typical Bradford case When we considered Web spacesas the sitors the resulting distribution was concave from above with no pointof inmacrection and passing through the origin The other distributions in whichpages were considered as the sitors could in general be characterized as beingcurves which start noticeably above the origin and have several points ofinmacrection ( ) It was also notable regrstly that taking Web spaces asdestinations instead of pages greatly reduced the accumulated sitations andmuch of the complexity of the curve and secondly that eliminating the targetsthat were very frequently linked to in very large Web spaces sharply reducedthe accumulated sitations and the curves changed now to pass through theorigin and presented an exponential dependence on the accumulated sitations

In sum everything seems to support the observation of Kim (2000) that thereare different motivations behind the citations in scientiregc articles and the linksof the World Wide Web In this sense we conregrmed the immense inmacruence ofthematic restriction (for instance nearly all the Web spaces included linksrelated to the World Wide Web itself and its technology) and of the repetitivelinks found in most of the very large Web spaces (which are usually generatedby means of software tools that allow links to be automatically included onevery page)

References

Aguillo IF (2000) ordfIndicadores hacia una evaluacioAcircn no objetiva (cuantitativa) de sedes webordm inVII Jornadas EspanAuml olas de DocumentacioAcircn (Bilbao 19-21 de octubre de 2000) ServicioEditorial de la Universidad del PaotildeAcircs Vasco Bilbao pp 233-48

Almind TC and Ingwersen P (1997) ordfInformetric analyses on the World Wide Webmethodological approaches to `Webometricsrsquoordm Journal of Documentation Vol 53 No 4pp 404-26

Bar-Ilan J (1997) ordfThe `mad cow diseasersquo Usenet newsgroups and bibliometric lawsordmScientometrics Vol 39 No 1 pp 29-55

Bar-Ilan J (2001) ordfData collection methods on the Web for informetrics purposes a review andanalysisordm Scientometrics Vol 50 No 1 pp 7-32

BarabaAcircsi AL and Albert R (1999) ordfEmergence of scaling in random networksordm ScienceVol 286 pp 509-12

BjoEgraverneborn L and Ingwersen P (2001) ordfPerspectives of webometricsordm Scientometrics Vol 50No 1 pp 65-82

Bradford SC (1934) ordfSourcesof information of speciregc subjectsordm Engineering Vol 137 pp 85-6

Brin S and Page L (1998) ordfThe anatomy of a large-scale hypertextual web search engineordmComputer Networks and ISDN Systems Vol 30 pp 107-17

Broder A et al (2000) ordfGraph structure in the Webordm Computer Networks and ISDN SystemsVol 33 No 1-6 pp 309-20

BrookesBC (1969) ordfBradfordrsquos law and the bibliography of scienceordm Nature Vol 224 pp 953-6

Codina L (2000) ordfParaAcircmetros e indicadores de calidad para la evaluacioAcircn de recursos digitalesordmin VII Jornadas EspanAuml olas de DocumentacioAcircn (Bilbao 19-21 de octubre de 2000) ServicioEditorial de la Universidad del PaotildeAcircs Vasco Bilbao pp 135-44

JD595

578

Correa-Uribe G (1999) Colombia Conectada al Mundo Sitios Web Colombianos Universidad deAntioquotildeAcirca AntioquotildeAcirca

Cronin B (2001) ordfBibliometrics and beyond some thoughts on web-based citation analysisordmJournal of Information Science Vol 27 No 1 pp 1-7

Cui L (1999) ordfRating health web sites using the principles of citation analysis a bibliometricapproachordm Journal of Medical Internet Research Vol 1 No e4 avalaible atwwwjmirorg19991e4indexhtm (accessed 18 August 2001)

Cybermetrics International Journal of Scientometrics Informetrics and Bibliometrics (2002)available at wwwcindoccsicescybermetrics (accessed 11 August 2002)

Egghe L (2000) ordfNew informetric aspects of the Internet some remacrections plusmn many problemsordmJournal of Information Science Vol 26 No 5 pp 329-35

Ferreiro-AlaAcircez L (1981) ordfAnaAcirclisis de referencias y caracterotildeAcircsticas bibliomeAcirctricas de losconjuntos de revistas nuclearesordm Revista EspanAuml ola de DocumentacioAcircn CientotildeAcircregca Vol 4No 3 pp 181-98

Gorbea-Portal S (1996) El Modelo MatemaAcirctico de Bradford Su AplicacioAcircn a las RevistasLatinoamericanas de las Ciencias BibliotecoloAcircgicas y de la InformacioAcircn UNAM CentroUniversitario de Investigaciones BibliotecoloAcircgicas MeAcircxico

Gupta DK (1991) ordfApplication of Bradfordrsquos law to citation data of Ethiopian medicaljournalsordm Annals of Library Science and Documentation Vol 38 No 3 pp 85-98

Harter SP and Ford CE (2000) ordfWeb-based analyses of e-journal impact approachesproblems and issuesordm Library Science With a Slant to Documentation and InformationStudies Vol 51 No 13 pp 1159-76

Houston W (1983) ordfThe application of bibliometrics to veterinary science primary literatureordmQuarterly Bulletin of International Association of Agricultural Information SpecialistsVol 28 No 1 pp 6-13

Internet Software Consortium (2002) ordfInternet domain surveyordm available at wwwiscorg(accessed 4 November 2002)

JimeAcircnez-Piano M (2001) ordfEvaluacioAcircn de sedes webordm Revista EspanAuml ola de DocumentacioAcircnCientotildeAcircregca Vol 24 No 4 pp 405-32

Kim HJ (2000) ordfMotivations for hyperlinking in scholarly electronic articles a qualitativestudyordm Journal of the American Society for Information Science Vol 51 No 10 pp 887-99

Kumar R et al (2001) ordfTrawling the Web for emerging cyber-communitiesordm in Mendelzon A etal (Eds) Proceedings of the 8th International World Wide Web Conference (TorontoCanadaAcirc May 11-14 1999) available at www8orgw8-papers4a-search-miningtrawlingtrawlinghtml (accessed 2 October 2002) Elsevier Amsterdam

Lal A and Panda KC (1999) ordfBradfordrsquos law and its application to bibliographical data ofplant pathology dissertations an analytical approachordm Library Science With a Slant toDocumentation and Information Studies Vol 36 No 3 pp 193-206

Larson RR (1996) ordfBibliometrics of the World Wide Web and exploratory analysis of theintellectual structure of cyberspaceordm in Hardin S (Ed) Proceedings of the 59th AnnualMeeting of the American Society for Information Science (Baltimore Maryland 1996)Information Today Medford NJ pp 71-8 available at httpsherlockberkeleyeduasis96asis96html (accessed 14 October 2000)

McKiernan G (1996) ordfCitedSites(sm) citation indexing of Web resourcesordm available atwwwpubliciastateedu CYBERSTACKSCitedhtm (accessed 24 February 2000)

Mubeen MA (1996) ordfCitation analysis of doctoral dissertations in chemistryordm Annals of LibraryScience and Documentation Vol 43 No 2 pp 48-58

ordfSitationordmdistributions

579

Price DJ (1970) ordfCitation measures of hard science soft science technology and non-scienceordm inNelson CC and Pollock DE (Eds) Communication Among Scientists and Engineers DCHealth and Co Lexington MA pp 3-22

Reyes-BarragaAcircn MJ et al (2000) ordfRevistas cientotildeAcircregcas determinacioAcircn de necesidades y usosordmRevista EspanAuml ola de DocumentacioAcircn CientotildeAcircregca Vol 23 No 4 pp 417-36

RodrotildeAcircguez i GairotildeAcircn JM (1997) ordfValoracioAcircn del impacto de la informacioAcircn en Internet Altavistael `Citation Indexrsquo de la redordm Revista EspanAuml ola de DocumentacioAcircn CientotildeAcircregca Vol 20 No 2pp 175-81

Rousseau R (1997) ordfCitations an exploratory studyordm Cybermetrics International Journal ofScientometrics Informetrics and Bibliometrics Vol 1 p 1 available atwwwcindoccsicescybermetricsarticlesv1i1p1html (accessed 5 September 2000)

Smith AG (1996) ordfCriteria for evaluation of Internet information resourcesordm available atwwwvuwacnz agsmithevalnindexhtm (accessed 25 March 2000)

Smith AG (1997) ordfTesting the surf criteria for evaluating Internet information resourcesordm ThePublic-Access Computer Systems Review Vol 8 No 3 available at httpinfolibuheduprv8n3smit8n3html (accessed 28 February 2002)

Smith AG (1999) ordfThe impact of web sites a comparison betwen Australasia and LatinAmericaordm available at wwwvuwacnz agsmithpublnsaustlat (accessed 12 May2001)

Smith AG (2001) ordfApplying evaluation criteria to New Zealand government websitesordmInternational Journal of Information Management Vol 21 No 2 pp 137-49

Tague-Sutcliffe J (1992) ordfAn introduction to informetricsordm Information Processing andManagement Vol 28 No 1 pp 1-3

Tillman HN (2000) ordfEvaluating quality on the Netordm available atwwwtiacnetusershoperegndqualhtml (accessed 5 June 2000)

van Raan AFJ (2001) ordfBibliometrics and Internet some observations and expectationsordmScientometrics Vol 50 No 1 pp 59-63

Vreeland RC (2000) ordfLaw libraries in hyperspacea citation analysis of World Wide Web sitesordmLaw Library Journal Vol 92 No 1 pp 9-25

Zhang P and von Dran GM (2000) ordfSatisregers and dissatisregers a two-factor model for websitedesign and evaluationordm Journal of the American Society for Information Science Vol 51No 14 pp 1253-68

JD595

580

Figure 6Accumulated sitations vsaccumulated Webspaces The data arefrom the second set Theresults are presented fortarget pagesindependently groupedby Web space andexcluding targets thatwere very frequentlylinked to in very largeWeb spaces Only sitorWeb spaces are counted

JD595

572

Figure 7Accumulated sitations vs

accumulated Webspaces The data are

from the regrst set Theresults are presented for

target pagesindependently grouped

by Web space andexcluding targets thatwere very frequently

linked to in very largeWeb spaces Sitor pages

are counted

ordfSitationordmdistributions

573

Figure 8Accumulated sitations vsaccumulated Webspaces The data arefrom the regrst set Theresults are presented fortarget pagesindependently groupedby Web space andexcluding targets thatwere very frequentlylinked to in very largeWeb spaces Sitor pagesare counted but onlytargets aboutExtremadura

JD595

574

Figure 9Accumulated sitations vs

accumulated Webspaces The data are

from the second set Theresults are presented for

target pagesindependently grouped

by Web space andexcluding targets thatwere very frequently

linked to in very largeWeb spaces Sitor pages

are counted

ordfSitationordmdistributions

575

Figure 10Accumulated sitations vsaccumulated Webspaces The data arefrom the second set Theresults are presented fortarget pagesindependently groupedby Web space andexcluding targets thatwere very frequentlylinked to in very largeWeb spaces Sitor pagesare counted but onlytargets aboutExtremadura

JD595

576

Although as this distribution is plotted together with the other distributionsits slope is less clearly perceptible one nevertheless observes that the slopeincreases continuously implying an exponential growth of the accumulatedsitations also Indeed if a logarithmic scale is also used on the vertical axis theresult is a straight line with no point of inmacrection and not regtting Bradfordrsquoslaw It is interesting that with the elimination of this set of links the dependenceof the sitations becomes exponential Again the verbal statement of the law isnot satisreged since the proportion 1nn2 does not hold

Some authors Brookes (1969) for instance consider that Bradfordrsquos law ismainly applicable in small and well-deregned areas of research in which theremust exist a strong thematic relationship between the documents In Figure 8we have therefore represented only the pages or Web spaces that are fromExtremadura (for the regrst set of data) The behaviour however is quite similarto the previous case the regrst two distributions are fairly similar to each otherwith the greatest difference being in the third distribution which is a straightline

Figures 9 and 10 show the same calculations as Figures 7 and 8 but now fordata from the second set The results are identical

ConclusionsThe present experiment focused on the distribution of citations in a closed butheterogeneous environment of thematically related Web spaces using two setsof data We studied how they regt on the one hand a power law distribution andon the other Bradfordrsquos law looking at a great many variants with a total of 384plots

Studies in the literature have shown that on a very large scale sitationsfollow a power law distribution with an exponent of 21 The present sitationdistributions from both data sets were coherent with those earlier studiesincluding in the exponent independently of whether or not the counts includedbroken links total links or only sitor pages and pages or Web spaces both asorigin and as target We showed that the distributions differed howeveraccording to which calculation procedure was used and to what reglter wasapplied to the data In particular when we considered only Web spaces asorigins instead of pages or links and pages as destinations the slope increasedto correspond approximately to an exponent of 28 When however bothdestinations and origins were Web spaces the exponent returned to the value21 Also when the destinations were thematically restricted (in which casethey practically coincided with the citors) the slope was reduced to correspondapproximately to an exponent of 15 Finally the tail in the data was reduced byeliminating the targets that were very frequently linked to in very large Webspaces

We studied the regt to Bradfordrsquos law by plotting the accumulated sitationsagainst the accumulated targets Although we presented many distributions

ordfSitationordmdistributions

577

none of them regtted the typical Bradford case When we considered Web spacesas the sitors the resulting distribution was concave from above with no pointof inmacrection and passing through the origin The other distributions in whichpages were considered as the sitors could in general be characterized as beingcurves which start noticeably above the origin and have several points ofinmacrection ( ) It was also notable regrstly that taking Web spaces asdestinations instead of pages greatly reduced the accumulated sitations andmuch of the complexity of the curve and secondly that eliminating the targetsthat were very frequently linked to in very large Web spaces sharply reducedthe accumulated sitations and the curves changed now to pass through theorigin and presented an exponential dependence on the accumulated sitations

In sum everything seems to support the observation of Kim (2000) that thereare different motivations behind the citations in scientiregc articles and the linksof the World Wide Web In this sense we conregrmed the immense inmacruence ofthematic restriction (for instance nearly all the Web spaces included linksrelated to the World Wide Web itself and its technology) and of the repetitivelinks found in most of the very large Web spaces (which are usually generatedby means of software tools that allow links to be automatically included onevery page)

References

Aguillo IF (2000) ordfIndicadores hacia una evaluacioAcircn no objetiva (cuantitativa) de sedes webordm inVII Jornadas EspanAuml olas de DocumentacioAcircn (Bilbao 19-21 de octubre de 2000) ServicioEditorial de la Universidad del PaotildeAcircs Vasco Bilbao pp 233-48

Almind TC and Ingwersen P (1997) ordfInformetric analyses on the World Wide Webmethodological approaches to `Webometricsrsquoordm Journal of Documentation Vol 53 No 4pp 404-26

Bar-Ilan J (1997) ordfThe `mad cow diseasersquo Usenet newsgroups and bibliometric lawsordmScientometrics Vol 39 No 1 pp 29-55

Bar-Ilan J (2001) ordfData collection methods on the Web for informetrics purposes a review andanalysisordm Scientometrics Vol 50 No 1 pp 7-32

BarabaAcircsi AL and Albert R (1999) ordfEmergence of scaling in random networksordm ScienceVol 286 pp 509-12

BjoEgraverneborn L and Ingwersen P (2001) ordfPerspectives of webometricsordm Scientometrics Vol 50No 1 pp 65-82

Bradford SC (1934) ordfSourcesof information of speciregc subjectsordm Engineering Vol 137 pp 85-6

Brin S and Page L (1998) ordfThe anatomy of a large-scale hypertextual web search engineordmComputer Networks and ISDN Systems Vol 30 pp 107-17

Broder A et al (2000) ordfGraph structure in the Webordm Computer Networks and ISDN SystemsVol 33 No 1-6 pp 309-20

BrookesBC (1969) ordfBradfordrsquos law and the bibliography of scienceordm Nature Vol 224 pp 953-6

Codina L (2000) ordfParaAcircmetros e indicadores de calidad para la evaluacioAcircn de recursos digitalesordmin VII Jornadas EspanAuml olas de DocumentacioAcircn (Bilbao 19-21 de octubre de 2000) ServicioEditorial de la Universidad del PaotildeAcircs Vasco Bilbao pp 135-44

JD595

578

Correa-Uribe G (1999) Colombia Conectada al Mundo Sitios Web Colombianos Universidad deAntioquotildeAcirca AntioquotildeAcirca

Cronin B (2001) ordfBibliometrics and beyond some thoughts on web-based citation analysisordmJournal of Information Science Vol 27 No 1 pp 1-7

Cui L (1999) ordfRating health web sites using the principles of citation analysis a bibliometricapproachordm Journal of Medical Internet Research Vol 1 No e4 avalaible atwwwjmirorg19991e4indexhtm (accessed 18 August 2001)

Cybermetrics International Journal of Scientometrics Informetrics and Bibliometrics (2002)available at wwwcindoccsicescybermetrics (accessed 11 August 2002)

Egghe L (2000) ordfNew informetric aspects of the Internet some remacrections plusmn many problemsordmJournal of Information Science Vol 26 No 5 pp 329-35

Ferreiro-AlaAcircez L (1981) ordfAnaAcirclisis de referencias y caracterotildeAcircsticas bibliomeAcirctricas de losconjuntos de revistas nuclearesordm Revista EspanAuml ola de DocumentacioAcircn CientotildeAcircregca Vol 4No 3 pp 181-98

Gorbea-Portal S (1996) El Modelo MatemaAcirctico de Bradford Su AplicacioAcircn a las RevistasLatinoamericanas de las Ciencias BibliotecoloAcircgicas y de la InformacioAcircn UNAM CentroUniversitario de Investigaciones BibliotecoloAcircgicas MeAcircxico

Gupta DK (1991) ordfApplication of Bradfordrsquos law to citation data of Ethiopian medicaljournalsordm Annals of Library Science and Documentation Vol 38 No 3 pp 85-98

Harter SP and Ford CE (2000) ordfWeb-based analyses of e-journal impact approachesproblems and issuesordm Library Science With a Slant to Documentation and InformationStudies Vol 51 No 13 pp 1159-76

Houston W (1983) ordfThe application of bibliometrics to veterinary science primary literatureordmQuarterly Bulletin of International Association of Agricultural Information SpecialistsVol 28 No 1 pp 6-13

Internet Software Consortium (2002) ordfInternet domain surveyordm available at wwwiscorg(accessed 4 November 2002)

JimeAcircnez-Piano M (2001) ordfEvaluacioAcircn de sedes webordm Revista EspanAuml ola de DocumentacioAcircnCientotildeAcircregca Vol 24 No 4 pp 405-32

Kim HJ (2000) ordfMotivations for hyperlinking in scholarly electronic articles a qualitativestudyordm Journal of the American Society for Information Science Vol 51 No 10 pp 887-99

Kumar R et al (2001) ordfTrawling the Web for emerging cyber-communitiesordm in Mendelzon A etal (Eds) Proceedings of the 8th International World Wide Web Conference (TorontoCanadaAcirc May 11-14 1999) available at www8orgw8-papers4a-search-miningtrawlingtrawlinghtml (accessed 2 October 2002) Elsevier Amsterdam

Lal A and Panda KC (1999) ordfBradfordrsquos law and its application to bibliographical data ofplant pathology dissertations an analytical approachordm Library Science With a Slant toDocumentation and Information Studies Vol 36 No 3 pp 193-206

Larson RR (1996) ordfBibliometrics of the World Wide Web and exploratory analysis of theintellectual structure of cyberspaceordm in Hardin S (Ed) Proceedings of the 59th AnnualMeeting of the American Society for Information Science (Baltimore Maryland 1996)Information Today Medford NJ pp 71-8 available at httpsherlockberkeleyeduasis96asis96html (accessed 14 October 2000)

McKiernan G (1996) ordfCitedSites(sm) citation indexing of Web resourcesordm available atwwwpubliciastateedu CYBERSTACKSCitedhtm (accessed 24 February 2000)

Mubeen MA (1996) ordfCitation analysis of doctoral dissertations in chemistryordm Annals of LibraryScience and Documentation Vol 43 No 2 pp 48-58

ordfSitationordmdistributions

579

Price DJ (1970) ordfCitation measures of hard science soft science technology and non-scienceordm inNelson CC and Pollock DE (Eds) Communication Among Scientists and Engineers DCHealth and Co Lexington MA pp 3-22

Reyes-BarragaAcircn MJ et al (2000) ordfRevistas cientotildeAcircregcas determinacioAcircn de necesidades y usosordmRevista EspanAuml ola de DocumentacioAcircn CientotildeAcircregca Vol 23 No 4 pp 417-36

RodrotildeAcircguez i GairotildeAcircn JM (1997) ordfValoracioAcircn del impacto de la informacioAcircn en Internet Altavistael `Citation Indexrsquo de la redordm Revista EspanAuml ola de DocumentacioAcircn CientotildeAcircregca Vol 20 No 2pp 175-81

Rousseau R (1997) ordfCitations an exploratory studyordm Cybermetrics International Journal ofScientometrics Informetrics and Bibliometrics Vol 1 p 1 available atwwwcindoccsicescybermetricsarticlesv1i1p1html (accessed 5 September 2000)

Smith AG (1996) ordfCriteria for evaluation of Internet information resourcesordm available atwwwvuwacnz agsmithevalnindexhtm (accessed 25 March 2000)

Smith AG (1997) ordfTesting the surf criteria for evaluating Internet information resourcesordm ThePublic-Access Computer Systems Review Vol 8 No 3 available at httpinfolibuheduprv8n3smit8n3html (accessed 28 February 2002)

Smith AG (1999) ordfThe impact of web sites a comparison betwen Australasia and LatinAmericaordm available at wwwvuwacnz agsmithpublnsaustlat (accessed 12 May2001)

Smith AG (2001) ordfApplying evaluation criteria to New Zealand government websitesordmInternational Journal of Information Management Vol 21 No 2 pp 137-49

Tague-Sutcliffe J (1992) ordfAn introduction to informetricsordm Information Processing andManagement Vol 28 No 1 pp 1-3

Tillman HN (2000) ordfEvaluating quality on the Netordm available atwwwtiacnetusershoperegndqualhtml (accessed 5 June 2000)

van Raan AFJ (2001) ordfBibliometrics and Internet some observations and expectationsordmScientometrics Vol 50 No 1 pp 59-63

Vreeland RC (2000) ordfLaw libraries in hyperspacea citation analysis of World Wide Web sitesordmLaw Library Journal Vol 92 No 1 pp 9-25

Zhang P and von Dran GM (2000) ordfSatisregers and dissatisregers a two-factor model for websitedesign and evaluationordm Journal of the American Society for Information Science Vol 51No 14 pp 1253-68

JD595

580

Figure 7Accumulated sitations vs

accumulated Webspaces The data are

from the regrst set Theresults are presented for

target pagesindependently grouped

by Web space andexcluding targets thatwere very frequently

linked to in very largeWeb spaces Sitor pages

are counted

ordfSitationordmdistributions

573

Figure 8Accumulated sitations vsaccumulated Webspaces The data arefrom the regrst set Theresults are presented fortarget pagesindependently groupedby Web space andexcluding targets thatwere very frequentlylinked to in very largeWeb spaces Sitor pagesare counted but onlytargets aboutExtremadura

JD595

574

Figure 9Accumulated sitations vs

accumulated Webspaces The data are

from the second set Theresults are presented for

target pagesindependently grouped

by Web space andexcluding targets thatwere very frequently

linked to in very largeWeb spaces Sitor pages

are counted

ordfSitationordmdistributions

575

Figure 10Accumulated sitations vsaccumulated Webspaces The data arefrom the second set Theresults are presented fortarget pagesindependently groupedby Web space andexcluding targets thatwere very frequentlylinked to in very largeWeb spaces Sitor pagesare counted but onlytargets aboutExtremadura

JD595

576

Although as this distribution is plotted together with the other distributionsits slope is less clearly perceptible one nevertheless observes that the slopeincreases continuously implying an exponential growth of the accumulatedsitations also Indeed if a logarithmic scale is also used on the vertical axis theresult is a straight line with no point of inmacrection and not regtting Bradfordrsquoslaw It is interesting that with the elimination of this set of links the dependenceof the sitations becomes exponential Again the verbal statement of the law isnot satisreged since the proportion 1nn2 does not hold

Some authors Brookes (1969) for instance consider that Bradfordrsquos law ismainly applicable in small and well-deregned areas of research in which theremust exist a strong thematic relationship between the documents In Figure 8we have therefore represented only the pages or Web spaces that are fromExtremadura (for the regrst set of data) The behaviour however is quite similarto the previous case the regrst two distributions are fairly similar to each otherwith the greatest difference being in the third distribution which is a straightline

Figures 9 and 10 show the same calculations as Figures 7 and 8 but now fordata from the second set The results are identical

ConclusionsThe present experiment focused on the distribution of citations in a closed butheterogeneous environment of thematically related Web spaces using two setsof data We studied how they regt on the one hand a power law distribution andon the other Bradfordrsquos law looking at a great many variants with a total of 384plots

Studies in the literature have shown that on a very large scale sitationsfollow a power law distribution with an exponent of 21 The present sitationdistributions from both data sets were coherent with those earlier studiesincluding in the exponent independently of whether or not the counts includedbroken links total links or only sitor pages and pages or Web spaces both asorigin and as target We showed that the distributions differed howeveraccording to which calculation procedure was used and to what reglter wasapplied to the data In particular when we considered only Web spaces asorigins instead of pages or links and pages as destinations the slope increasedto correspond approximately to an exponent of 28 When however bothdestinations and origins were Web spaces the exponent returned to the value21 Also when the destinations were thematically restricted (in which casethey practically coincided with the citors) the slope was reduced to correspondapproximately to an exponent of 15 Finally the tail in the data was reduced byeliminating the targets that were very frequently linked to in very large Webspaces

We studied the regt to Bradfordrsquos law by plotting the accumulated sitationsagainst the accumulated targets Although we presented many distributions

ordfSitationordmdistributions

577

none of them regtted the typical Bradford case When we considered Web spacesas the sitors the resulting distribution was concave from above with no pointof inmacrection and passing through the origin The other distributions in whichpages were considered as the sitors could in general be characterized as beingcurves which start noticeably above the origin and have several points ofinmacrection ( ) It was also notable regrstly that taking Web spaces asdestinations instead of pages greatly reduced the accumulated sitations andmuch of the complexity of the curve and secondly that eliminating the targetsthat were very frequently linked to in very large Web spaces sharply reducedthe accumulated sitations and the curves changed now to pass through theorigin and presented an exponential dependence on the accumulated sitations

In sum everything seems to support the observation of Kim (2000) that thereare different motivations behind the citations in scientiregc articles and the linksof the World Wide Web In this sense we conregrmed the immense inmacruence ofthematic restriction (for instance nearly all the Web spaces included linksrelated to the World Wide Web itself and its technology) and of the repetitivelinks found in most of the very large Web spaces (which are usually generatedby means of software tools that allow links to be automatically included onevery page)

References

Aguillo IF (2000) ordfIndicadores hacia una evaluacioAcircn no objetiva (cuantitativa) de sedes webordm inVII Jornadas EspanAuml olas de DocumentacioAcircn (Bilbao 19-21 de octubre de 2000) ServicioEditorial de la Universidad del PaotildeAcircs Vasco Bilbao pp 233-48

Almind TC and Ingwersen P (1997) ordfInformetric analyses on the World Wide Webmethodological approaches to `Webometricsrsquoordm Journal of Documentation Vol 53 No 4pp 404-26

Bar-Ilan J (1997) ordfThe `mad cow diseasersquo Usenet newsgroups and bibliometric lawsordmScientometrics Vol 39 No 1 pp 29-55

Bar-Ilan J (2001) ordfData collection methods on the Web for informetrics purposes a review andanalysisordm Scientometrics Vol 50 No 1 pp 7-32

BarabaAcircsi AL and Albert R (1999) ordfEmergence of scaling in random networksordm ScienceVol 286 pp 509-12

BjoEgraverneborn L and Ingwersen P (2001) ordfPerspectives of webometricsordm Scientometrics Vol 50No 1 pp 65-82

Bradford SC (1934) ordfSourcesof information of speciregc subjectsordm Engineering Vol 137 pp 85-6

Brin S and Page L (1998) ordfThe anatomy of a large-scale hypertextual web search engineordmComputer Networks and ISDN Systems Vol 30 pp 107-17

Broder A et al (2000) ordfGraph structure in the Webordm Computer Networks and ISDN SystemsVol 33 No 1-6 pp 309-20

BrookesBC (1969) ordfBradfordrsquos law and the bibliography of scienceordm Nature Vol 224 pp 953-6

Codina L (2000) ordfParaAcircmetros e indicadores de calidad para la evaluacioAcircn de recursos digitalesordmin VII Jornadas EspanAuml olas de DocumentacioAcircn (Bilbao 19-21 de octubre de 2000) ServicioEditorial de la Universidad del PaotildeAcircs Vasco Bilbao pp 135-44

JD595

578

Correa-Uribe G (1999) Colombia Conectada al Mundo Sitios Web Colombianos Universidad deAntioquotildeAcirca AntioquotildeAcirca

Cronin B (2001) ordfBibliometrics and beyond some thoughts on web-based citation analysisordmJournal of Information Science Vol 27 No 1 pp 1-7

Cui L (1999) ordfRating health web sites using the principles of citation analysis a bibliometricapproachordm Journal of Medical Internet Research Vol 1 No e4 avalaible atwwwjmirorg19991e4indexhtm (accessed 18 August 2001)

Cybermetrics International Journal of Scientometrics Informetrics and Bibliometrics (2002)available at wwwcindoccsicescybermetrics (accessed 11 August 2002)

Egghe L (2000) ordfNew informetric aspects of the Internet some remacrections plusmn many problemsordmJournal of Information Science Vol 26 No 5 pp 329-35

Ferreiro-AlaAcircez L (1981) ordfAnaAcirclisis de referencias y caracterotildeAcircsticas bibliomeAcirctricas de losconjuntos de revistas nuclearesordm Revista EspanAuml ola de DocumentacioAcircn CientotildeAcircregca Vol 4No 3 pp 181-98

Gorbea-Portal S (1996) El Modelo MatemaAcirctico de Bradford Su AplicacioAcircn a las RevistasLatinoamericanas de las Ciencias BibliotecoloAcircgicas y de la InformacioAcircn UNAM CentroUniversitario de Investigaciones BibliotecoloAcircgicas MeAcircxico

Gupta DK (1991) ordfApplication of Bradfordrsquos law to citation data of Ethiopian medicaljournalsordm Annals of Library Science and Documentation Vol 38 No 3 pp 85-98

Harter SP and Ford CE (2000) ordfWeb-based analyses of e-journal impact approachesproblems and issuesordm Library Science With a Slant to Documentation and InformationStudies Vol 51 No 13 pp 1159-76

Houston W (1983) ordfThe application of bibliometrics to veterinary science primary literatureordmQuarterly Bulletin of International Association of Agricultural Information SpecialistsVol 28 No 1 pp 6-13

Internet Software Consortium (2002) ordfInternet domain surveyordm available at wwwiscorg(accessed 4 November 2002)

JimeAcircnez-Piano M (2001) ordfEvaluacioAcircn de sedes webordm Revista EspanAuml ola de DocumentacioAcircnCientotildeAcircregca Vol 24 No 4 pp 405-32

Kim HJ (2000) ordfMotivations for hyperlinking in scholarly electronic articles a qualitativestudyordm Journal of the American Society for Information Science Vol 51 No 10 pp 887-99

Kumar R et al (2001) ordfTrawling the Web for emerging cyber-communitiesordm in Mendelzon A etal (Eds) Proceedings of the 8th International World Wide Web Conference (TorontoCanadaAcirc May 11-14 1999) available at www8orgw8-papers4a-search-miningtrawlingtrawlinghtml (accessed 2 October 2002) Elsevier Amsterdam

Lal A and Panda KC (1999) ordfBradfordrsquos law and its application to bibliographical data ofplant pathology dissertations an analytical approachordm Library Science With a Slant toDocumentation and Information Studies Vol 36 No 3 pp 193-206

Larson RR (1996) ordfBibliometrics of the World Wide Web and exploratory analysis of theintellectual structure of cyberspaceordm in Hardin S (Ed) Proceedings of the 59th AnnualMeeting of the American Society for Information Science (Baltimore Maryland 1996)Information Today Medford NJ pp 71-8 available at httpsherlockberkeleyeduasis96asis96html (accessed 14 October 2000)

McKiernan G (1996) ordfCitedSites(sm) citation indexing of Web resourcesordm available atwwwpubliciastateedu CYBERSTACKSCitedhtm (accessed 24 February 2000)

Mubeen MA (1996) ordfCitation analysis of doctoral dissertations in chemistryordm Annals of LibraryScience and Documentation Vol 43 No 2 pp 48-58

ordfSitationordmdistributions

579

Price DJ (1970) ordfCitation measures of hard science soft science technology and non-scienceordm inNelson CC and Pollock DE (Eds) Communication Among Scientists and Engineers DCHealth and Co Lexington MA pp 3-22

Reyes-BarragaAcircn MJ et al (2000) ordfRevistas cientotildeAcircregcas determinacioAcircn de necesidades y usosordmRevista EspanAuml ola de DocumentacioAcircn CientotildeAcircregca Vol 23 No 4 pp 417-36

RodrotildeAcircguez i GairotildeAcircn JM (1997) ordfValoracioAcircn del impacto de la informacioAcircn en Internet Altavistael `Citation Indexrsquo de la redordm Revista EspanAuml ola de DocumentacioAcircn CientotildeAcircregca Vol 20 No 2pp 175-81

Rousseau R (1997) ordfCitations an exploratory studyordm Cybermetrics International Journal ofScientometrics Informetrics and Bibliometrics Vol 1 p 1 available atwwwcindoccsicescybermetricsarticlesv1i1p1html (accessed 5 September 2000)

Smith AG (1996) ordfCriteria for evaluation of Internet information resourcesordm available atwwwvuwacnz agsmithevalnindexhtm (accessed 25 March 2000)

Smith AG (1997) ordfTesting the surf criteria for evaluating Internet information resourcesordm ThePublic-Access Computer Systems Review Vol 8 No 3 available at httpinfolibuheduprv8n3smit8n3html (accessed 28 February 2002)

Smith AG (1999) ordfThe impact of web sites a comparison betwen Australasia and LatinAmericaordm available at wwwvuwacnz agsmithpublnsaustlat (accessed 12 May2001)

Smith AG (2001) ordfApplying evaluation criteria to New Zealand government websitesordmInternational Journal of Information Management Vol 21 No 2 pp 137-49

Tague-Sutcliffe J (1992) ordfAn introduction to informetricsordm Information Processing andManagement Vol 28 No 1 pp 1-3

Tillman HN (2000) ordfEvaluating quality on the Netordm available atwwwtiacnetusershoperegndqualhtml (accessed 5 June 2000)

van Raan AFJ (2001) ordfBibliometrics and Internet some observations and expectationsordmScientometrics Vol 50 No 1 pp 59-63

Vreeland RC (2000) ordfLaw libraries in hyperspacea citation analysis of World Wide Web sitesordmLaw Library Journal Vol 92 No 1 pp 9-25

Zhang P and von Dran GM (2000) ordfSatisregers and dissatisregers a two-factor model for websitedesign and evaluationordm Journal of the American Society for Information Science Vol 51No 14 pp 1253-68

JD595

580

Figure 8Accumulated sitations vsaccumulated Webspaces The data arefrom the regrst set Theresults are presented fortarget pagesindependently groupedby Web space andexcluding targets thatwere very frequentlylinked to in very largeWeb spaces Sitor pagesare counted but onlytargets aboutExtremadura

JD595

574

Figure 9Accumulated sitations vs

accumulated Webspaces The data are

from the second set Theresults are presented for

target pagesindependently grouped

by Web space andexcluding targets thatwere very frequently

linked to in very largeWeb spaces Sitor pages

are counted

ordfSitationordmdistributions

575

Figure 10Accumulated sitations vsaccumulated Webspaces The data arefrom the second set Theresults are presented fortarget pagesindependently groupedby Web space andexcluding targets thatwere very frequentlylinked to in very largeWeb spaces Sitor pagesare counted but onlytargets aboutExtremadura

JD595

576

Although as this distribution is plotted together with the other distributionsits slope is less clearly perceptible one nevertheless observes that the slopeincreases continuously implying an exponential growth of the accumulatedsitations also Indeed if a logarithmic scale is also used on the vertical axis theresult is a straight line with no point of inmacrection and not regtting Bradfordrsquoslaw It is interesting that with the elimination of this set of links the dependenceof the sitations becomes exponential Again the verbal statement of the law isnot satisreged since the proportion 1nn2 does not hold

Some authors Brookes (1969) for instance consider that Bradfordrsquos law ismainly applicable in small and well-deregned areas of research in which theremust exist a strong thematic relationship between the documents In Figure 8we have therefore represented only the pages or Web spaces that are fromExtremadura (for the regrst set of data) The behaviour however is quite similarto the previous case the regrst two distributions are fairly similar to each otherwith the greatest difference being in the third distribution which is a straightline

Figures 9 and 10 show the same calculations as Figures 7 and 8 but now fordata from the second set The results are identical

ConclusionsThe present experiment focused on the distribution of citations in a closed butheterogeneous environment of thematically related Web spaces using two setsof data We studied how they regt on the one hand a power law distribution andon the other Bradfordrsquos law looking at a great many variants with a total of 384plots

Studies in the literature have shown that on a very large scale sitationsfollow a power law distribution with an exponent of 21 The present sitationdistributions from both data sets were coherent with those earlier studiesincluding in the exponent independently of whether or not the counts includedbroken links total links or only sitor pages and pages or Web spaces both asorigin and as target We showed that the distributions differed howeveraccording to which calculation procedure was used and to what reglter wasapplied to the data In particular when we considered only Web spaces asorigins instead of pages or links and pages as destinations the slope increasedto correspond approximately to an exponent of 28 When however bothdestinations and origins were Web spaces the exponent returned to the value21 Also when the destinations were thematically restricted (in which casethey practically coincided with the citors) the slope was reduced to correspondapproximately to an exponent of 15 Finally the tail in the data was reduced byeliminating the targets that were very frequently linked to in very large Webspaces

We studied the regt to Bradfordrsquos law by plotting the accumulated sitationsagainst the accumulated targets Although we presented many distributions

ordfSitationordmdistributions

577

none of them regtted the typical Bradford case When we considered Web spacesas the sitors the resulting distribution was concave from above with no pointof inmacrection and passing through the origin The other distributions in whichpages were considered as the sitors could in general be characterized as beingcurves which start noticeably above the origin and have several points ofinmacrection ( ) It was also notable regrstly that taking Web spaces asdestinations instead of pages greatly reduced the accumulated sitations andmuch of the complexity of the curve and secondly that eliminating the targetsthat were very frequently linked to in very large Web spaces sharply reducedthe accumulated sitations and the curves changed now to pass through theorigin and presented an exponential dependence on the accumulated sitations

In sum everything seems to support the observation of Kim (2000) that thereare different motivations behind the citations in scientiregc articles and the linksof the World Wide Web In this sense we conregrmed the immense inmacruence ofthematic restriction (for instance nearly all the Web spaces included linksrelated to the World Wide Web itself and its technology) and of the repetitivelinks found in most of the very large Web spaces (which are usually generatedby means of software tools that allow links to be automatically included onevery page)

References

Aguillo IF (2000) ordfIndicadores hacia una evaluacioAcircn no objetiva (cuantitativa) de sedes webordm inVII Jornadas EspanAuml olas de DocumentacioAcircn (Bilbao 19-21 de octubre de 2000) ServicioEditorial de la Universidad del PaotildeAcircs Vasco Bilbao pp 233-48

Almind TC and Ingwersen P (1997) ordfInformetric analyses on the World Wide Webmethodological approaches to `Webometricsrsquoordm Journal of Documentation Vol 53 No 4pp 404-26

Bar-Ilan J (1997) ordfThe `mad cow diseasersquo Usenet newsgroups and bibliometric lawsordmScientometrics Vol 39 No 1 pp 29-55

Bar-Ilan J (2001) ordfData collection methods on the Web for informetrics purposes a review andanalysisordm Scientometrics Vol 50 No 1 pp 7-32

BarabaAcircsi AL and Albert R (1999) ordfEmergence of scaling in random networksordm ScienceVol 286 pp 509-12

BjoEgraverneborn L and Ingwersen P (2001) ordfPerspectives of webometricsordm Scientometrics Vol 50No 1 pp 65-82

Bradford SC (1934) ordfSourcesof information of speciregc subjectsordm Engineering Vol 137 pp 85-6

Brin S and Page L (1998) ordfThe anatomy of a large-scale hypertextual web search engineordmComputer Networks and ISDN Systems Vol 30 pp 107-17

Broder A et al (2000) ordfGraph structure in the Webordm Computer Networks and ISDN SystemsVol 33 No 1-6 pp 309-20

BrookesBC (1969) ordfBradfordrsquos law and the bibliography of scienceordm Nature Vol 224 pp 953-6

Codina L (2000) ordfParaAcircmetros e indicadores de calidad para la evaluacioAcircn de recursos digitalesordmin VII Jornadas EspanAuml olas de DocumentacioAcircn (Bilbao 19-21 de octubre de 2000) ServicioEditorial de la Universidad del PaotildeAcircs Vasco Bilbao pp 135-44

JD595

578

Correa-Uribe G (1999) Colombia Conectada al Mundo Sitios Web Colombianos Universidad deAntioquotildeAcirca AntioquotildeAcirca

Cronin B (2001) ordfBibliometrics and beyond some thoughts on web-based citation analysisordmJournal of Information Science Vol 27 No 1 pp 1-7

Cui L (1999) ordfRating health web sites using the principles of citation analysis a bibliometricapproachordm Journal of Medical Internet Research Vol 1 No e4 avalaible atwwwjmirorg19991e4indexhtm (accessed 18 August 2001)

Cybermetrics International Journal of Scientometrics Informetrics and Bibliometrics (2002)available at wwwcindoccsicescybermetrics (accessed 11 August 2002)

Egghe L (2000) ordfNew informetric aspects of the Internet some remacrections plusmn many problemsordmJournal of Information Science Vol 26 No 5 pp 329-35

Ferreiro-AlaAcircez L (1981) ordfAnaAcirclisis de referencias y caracterotildeAcircsticas bibliomeAcirctricas de losconjuntos de revistas nuclearesordm Revista EspanAuml ola de DocumentacioAcircn CientotildeAcircregca Vol 4No 3 pp 181-98

Gorbea-Portal S (1996) El Modelo MatemaAcirctico de Bradford Su AplicacioAcircn a las RevistasLatinoamericanas de las Ciencias BibliotecoloAcircgicas y de la InformacioAcircn UNAM CentroUniversitario de Investigaciones BibliotecoloAcircgicas MeAcircxico

Gupta DK (1991) ordfApplication of Bradfordrsquos law to citation data of Ethiopian medicaljournalsordm Annals of Library Science and Documentation Vol 38 No 3 pp 85-98

Harter SP and Ford CE (2000) ordfWeb-based analyses of e-journal impact approachesproblems and issuesordm Library Science With a Slant to Documentation and InformationStudies Vol 51 No 13 pp 1159-76

Houston W (1983) ordfThe application of bibliometrics to veterinary science primary literatureordmQuarterly Bulletin of International Association of Agricultural Information SpecialistsVol 28 No 1 pp 6-13

Internet Software Consortium (2002) ordfInternet domain surveyordm available at wwwiscorg(accessed 4 November 2002)

JimeAcircnez-Piano M (2001) ordfEvaluacioAcircn de sedes webordm Revista EspanAuml ola de DocumentacioAcircnCientotildeAcircregca Vol 24 No 4 pp 405-32

Kim HJ (2000) ordfMotivations for hyperlinking in scholarly electronic articles a qualitativestudyordm Journal of the American Society for Information Science Vol 51 No 10 pp 887-99

Kumar R et al (2001) ordfTrawling the Web for emerging cyber-communitiesordm in Mendelzon A etal (Eds) Proceedings of the 8th International World Wide Web Conference (TorontoCanadaAcirc May 11-14 1999) available at www8orgw8-papers4a-search-miningtrawlingtrawlinghtml (accessed 2 October 2002) Elsevier Amsterdam

Lal A and Panda KC (1999) ordfBradfordrsquos law and its application to bibliographical data ofplant pathology dissertations an analytical approachordm Library Science With a Slant toDocumentation and Information Studies Vol 36 No 3 pp 193-206

Larson RR (1996) ordfBibliometrics of the World Wide Web and exploratory analysis of theintellectual structure of cyberspaceordm in Hardin S (Ed) Proceedings of the 59th AnnualMeeting of the American Society for Information Science (Baltimore Maryland 1996)Information Today Medford NJ pp 71-8 available at httpsherlockberkeleyeduasis96asis96html (accessed 14 October 2000)

McKiernan G (1996) ordfCitedSites(sm) citation indexing of Web resourcesordm available atwwwpubliciastateedu CYBERSTACKSCitedhtm (accessed 24 February 2000)

Mubeen MA (1996) ordfCitation analysis of doctoral dissertations in chemistryordm Annals of LibraryScience and Documentation Vol 43 No 2 pp 48-58

ordfSitationordmdistributions

579

Price DJ (1970) ordfCitation measures of hard science soft science technology and non-scienceordm inNelson CC and Pollock DE (Eds) Communication Among Scientists and Engineers DCHealth and Co Lexington MA pp 3-22

Reyes-BarragaAcircn MJ et al (2000) ordfRevistas cientotildeAcircregcas determinacioAcircn de necesidades y usosordmRevista EspanAuml ola de DocumentacioAcircn CientotildeAcircregca Vol 23 No 4 pp 417-36

RodrotildeAcircguez i GairotildeAcircn JM (1997) ordfValoracioAcircn del impacto de la informacioAcircn en Internet Altavistael `Citation Indexrsquo de la redordm Revista EspanAuml ola de DocumentacioAcircn CientotildeAcircregca Vol 20 No 2pp 175-81

Rousseau R (1997) ordfCitations an exploratory studyordm Cybermetrics International Journal ofScientometrics Informetrics and Bibliometrics Vol 1 p 1 available atwwwcindoccsicescybermetricsarticlesv1i1p1html (accessed 5 September 2000)

Smith AG (1996) ordfCriteria for evaluation of Internet information resourcesordm available atwwwvuwacnz agsmithevalnindexhtm (accessed 25 March 2000)

Smith AG (1997) ordfTesting the surf criteria for evaluating Internet information resourcesordm ThePublic-Access Computer Systems Review Vol 8 No 3 available at httpinfolibuheduprv8n3smit8n3html (accessed 28 February 2002)

Smith AG (1999) ordfThe impact of web sites a comparison betwen Australasia and LatinAmericaordm available at wwwvuwacnz agsmithpublnsaustlat (accessed 12 May2001)

Smith AG (2001) ordfApplying evaluation criteria to New Zealand government websitesordmInternational Journal of Information Management Vol 21 No 2 pp 137-49

Tague-Sutcliffe J (1992) ordfAn introduction to informetricsordm Information Processing andManagement Vol 28 No 1 pp 1-3

Tillman HN (2000) ordfEvaluating quality on the Netordm available atwwwtiacnetusershoperegndqualhtml (accessed 5 June 2000)

van Raan AFJ (2001) ordfBibliometrics and Internet some observations and expectationsordmScientometrics Vol 50 No 1 pp 59-63

Vreeland RC (2000) ordfLaw libraries in hyperspacea citation analysis of World Wide Web sitesordmLaw Library Journal Vol 92 No 1 pp 9-25

Zhang P and von Dran GM (2000) ordfSatisregers and dissatisregers a two-factor model for websitedesign and evaluationordm Journal of the American Society for Information Science Vol 51No 14 pp 1253-68

JD595

580

Figure 9Accumulated sitations vs

accumulated Webspaces The data are

from the second set Theresults are presented for

target pagesindependently grouped

by Web space andexcluding targets thatwere very frequently

linked to in very largeWeb spaces Sitor pages

are counted

ordfSitationordmdistributions

575

Figure 10Accumulated sitations vsaccumulated Webspaces The data arefrom the second set Theresults are presented fortarget pagesindependently groupedby Web space andexcluding targets thatwere very frequentlylinked to in very largeWeb spaces Sitor pagesare counted but onlytargets aboutExtremadura

JD595

576

Although as this distribution is plotted together with the other distributionsits slope is less clearly perceptible one nevertheless observes that the slopeincreases continuously implying an exponential growth of the accumulatedsitations also Indeed if a logarithmic scale is also used on the vertical axis theresult is a straight line with no point of inmacrection and not regtting Bradfordrsquoslaw It is interesting that with the elimination of this set of links the dependenceof the sitations becomes exponential Again the verbal statement of the law isnot satisreged since the proportion 1nn2 does not hold

Some authors Brookes (1969) for instance consider that Bradfordrsquos law ismainly applicable in small and well-deregned areas of research in which theremust exist a strong thematic relationship between the documents In Figure 8we have therefore represented only the pages or Web spaces that are fromExtremadura (for the regrst set of data) The behaviour however is quite similarto the previous case the regrst two distributions are fairly similar to each otherwith the greatest difference being in the third distribution which is a straightline

Figures 9 and 10 show the same calculations as Figures 7 and 8 but now fordata from the second set The results are identical

ConclusionsThe present experiment focused on the distribution of citations in a closed butheterogeneous environment of thematically related Web spaces using two setsof data We studied how they regt on the one hand a power law distribution andon the other Bradfordrsquos law looking at a great many variants with a total of 384plots

Studies in the literature have shown that on a very large scale sitationsfollow a power law distribution with an exponent of 21 The present sitationdistributions from both data sets were coherent with those earlier studiesincluding in the exponent independently of whether or not the counts includedbroken links total links or only sitor pages and pages or Web spaces both asorigin and as target We showed that the distributions differed howeveraccording to which calculation procedure was used and to what reglter wasapplied to the data In particular when we considered only Web spaces asorigins instead of pages or links and pages as destinations the slope increasedto correspond approximately to an exponent of 28 When however bothdestinations and origins were Web spaces the exponent returned to the value21 Also when the destinations were thematically restricted (in which casethey practically coincided with the citors) the slope was reduced to correspondapproximately to an exponent of 15 Finally the tail in the data was reduced byeliminating the targets that were very frequently linked to in very large Webspaces

We studied the regt to Bradfordrsquos law by plotting the accumulated sitationsagainst the accumulated targets Although we presented many distributions

ordfSitationordmdistributions

577

none of them regtted the typical Bradford case When we considered Web spacesas the sitors the resulting distribution was concave from above with no pointof inmacrection and passing through the origin The other distributions in whichpages were considered as the sitors could in general be characterized as beingcurves which start noticeably above the origin and have several points ofinmacrection ( ) It was also notable regrstly that taking Web spaces asdestinations instead of pages greatly reduced the accumulated sitations andmuch of the complexity of the curve and secondly that eliminating the targetsthat were very frequently linked to in very large Web spaces sharply reducedthe accumulated sitations and the curves changed now to pass through theorigin and presented an exponential dependence on the accumulated sitations

In sum everything seems to support the observation of Kim (2000) that thereare different motivations behind the citations in scientiregc articles and the linksof the World Wide Web In this sense we conregrmed the immense inmacruence ofthematic restriction (for instance nearly all the Web spaces included linksrelated to the World Wide Web itself and its technology) and of the repetitivelinks found in most of the very large Web spaces (which are usually generatedby means of software tools that allow links to be automatically included onevery page)

References

Aguillo IF (2000) ordfIndicadores hacia una evaluacioAcircn no objetiva (cuantitativa) de sedes webordm inVII Jornadas EspanAuml olas de DocumentacioAcircn (Bilbao 19-21 de octubre de 2000) ServicioEditorial de la Universidad del PaotildeAcircs Vasco Bilbao pp 233-48

Almind TC and Ingwersen P (1997) ordfInformetric analyses on the World Wide Webmethodological approaches to `Webometricsrsquoordm Journal of Documentation Vol 53 No 4pp 404-26

Bar-Ilan J (1997) ordfThe `mad cow diseasersquo Usenet newsgroups and bibliometric lawsordmScientometrics Vol 39 No 1 pp 29-55

Bar-Ilan J (2001) ordfData collection methods on the Web for informetrics purposes a review andanalysisordm Scientometrics Vol 50 No 1 pp 7-32

BarabaAcircsi AL and Albert R (1999) ordfEmergence of scaling in random networksordm ScienceVol 286 pp 509-12

BjoEgraverneborn L and Ingwersen P (2001) ordfPerspectives of webometricsordm Scientometrics Vol 50No 1 pp 65-82

Bradford SC (1934) ordfSourcesof information of speciregc subjectsordm Engineering Vol 137 pp 85-6

Brin S and Page L (1998) ordfThe anatomy of a large-scale hypertextual web search engineordmComputer Networks and ISDN Systems Vol 30 pp 107-17

Broder A et al (2000) ordfGraph structure in the Webordm Computer Networks and ISDN SystemsVol 33 No 1-6 pp 309-20

BrookesBC (1969) ordfBradfordrsquos law and the bibliography of scienceordm Nature Vol 224 pp 953-6

Codina L (2000) ordfParaAcircmetros e indicadores de calidad para la evaluacioAcircn de recursos digitalesordmin VII Jornadas EspanAuml olas de DocumentacioAcircn (Bilbao 19-21 de octubre de 2000) ServicioEditorial de la Universidad del PaotildeAcircs Vasco Bilbao pp 135-44

JD595

578

Correa-Uribe G (1999) Colombia Conectada al Mundo Sitios Web Colombianos Universidad deAntioquotildeAcirca AntioquotildeAcirca

Cronin B (2001) ordfBibliometrics and beyond some thoughts on web-based citation analysisordmJournal of Information Science Vol 27 No 1 pp 1-7

Cui L (1999) ordfRating health web sites using the principles of citation analysis a bibliometricapproachordm Journal of Medical Internet Research Vol 1 No e4 avalaible atwwwjmirorg19991e4indexhtm (accessed 18 August 2001)

Cybermetrics International Journal of Scientometrics Informetrics and Bibliometrics (2002)available at wwwcindoccsicescybermetrics (accessed 11 August 2002)

Egghe L (2000) ordfNew informetric aspects of the Internet some remacrections plusmn many problemsordmJournal of Information Science Vol 26 No 5 pp 329-35

Ferreiro-AlaAcircez L (1981) ordfAnaAcirclisis de referencias y caracterotildeAcircsticas bibliomeAcirctricas de losconjuntos de revistas nuclearesordm Revista EspanAuml ola de DocumentacioAcircn CientotildeAcircregca Vol 4No 3 pp 181-98

Gorbea-Portal S (1996) El Modelo MatemaAcirctico de Bradford Su AplicacioAcircn a las RevistasLatinoamericanas de las Ciencias BibliotecoloAcircgicas y de la InformacioAcircn UNAM CentroUniversitario de Investigaciones BibliotecoloAcircgicas MeAcircxico

Gupta DK (1991) ordfApplication of Bradfordrsquos law to citation data of Ethiopian medicaljournalsordm Annals of Library Science and Documentation Vol 38 No 3 pp 85-98

Harter SP and Ford CE (2000) ordfWeb-based analyses of e-journal impact approachesproblems and issuesordm Library Science With a Slant to Documentation and InformationStudies Vol 51 No 13 pp 1159-76

Houston W (1983) ordfThe application of bibliometrics to veterinary science primary literatureordmQuarterly Bulletin of International Association of Agricultural Information SpecialistsVol 28 No 1 pp 6-13

Internet Software Consortium (2002) ordfInternet domain surveyordm available at wwwiscorg(accessed 4 November 2002)

JimeAcircnez-Piano M (2001) ordfEvaluacioAcircn de sedes webordm Revista EspanAuml ola de DocumentacioAcircnCientotildeAcircregca Vol 24 No 4 pp 405-32

Kim HJ (2000) ordfMotivations for hyperlinking in scholarly electronic articles a qualitativestudyordm Journal of the American Society for Information Science Vol 51 No 10 pp 887-99

Kumar R et al (2001) ordfTrawling the Web for emerging cyber-communitiesordm in Mendelzon A etal (Eds) Proceedings of the 8th International World Wide Web Conference (TorontoCanadaAcirc May 11-14 1999) available at www8orgw8-papers4a-search-miningtrawlingtrawlinghtml (accessed 2 October 2002) Elsevier Amsterdam

Lal A and Panda KC (1999) ordfBradfordrsquos law and its application to bibliographical data ofplant pathology dissertations an analytical approachordm Library Science With a Slant toDocumentation and Information Studies Vol 36 No 3 pp 193-206

Larson RR (1996) ordfBibliometrics of the World Wide Web and exploratory analysis of theintellectual structure of cyberspaceordm in Hardin S (Ed) Proceedings of the 59th AnnualMeeting of the American Society for Information Science (Baltimore Maryland 1996)Information Today Medford NJ pp 71-8 available at httpsherlockberkeleyeduasis96asis96html (accessed 14 October 2000)

McKiernan G (1996) ordfCitedSites(sm) citation indexing of Web resourcesordm available atwwwpubliciastateedu CYBERSTACKSCitedhtm (accessed 24 February 2000)

Mubeen MA (1996) ordfCitation analysis of doctoral dissertations in chemistryordm Annals of LibraryScience and Documentation Vol 43 No 2 pp 48-58

ordfSitationordmdistributions

579

Price DJ (1970) ordfCitation measures of hard science soft science technology and non-scienceordm inNelson CC and Pollock DE (Eds) Communication Among Scientists and Engineers DCHealth and Co Lexington MA pp 3-22

Reyes-BarragaAcircn MJ et al (2000) ordfRevistas cientotildeAcircregcas determinacioAcircn de necesidades y usosordmRevista EspanAuml ola de DocumentacioAcircn CientotildeAcircregca Vol 23 No 4 pp 417-36

RodrotildeAcircguez i GairotildeAcircn JM (1997) ordfValoracioAcircn del impacto de la informacioAcircn en Internet Altavistael `Citation Indexrsquo de la redordm Revista EspanAuml ola de DocumentacioAcircn CientotildeAcircregca Vol 20 No 2pp 175-81

Rousseau R (1997) ordfCitations an exploratory studyordm Cybermetrics International Journal ofScientometrics Informetrics and Bibliometrics Vol 1 p 1 available atwwwcindoccsicescybermetricsarticlesv1i1p1html (accessed 5 September 2000)

Smith AG (1996) ordfCriteria for evaluation of Internet information resourcesordm available atwwwvuwacnz agsmithevalnindexhtm (accessed 25 March 2000)

Smith AG (1997) ordfTesting the surf criteria for evaluating Internet information resourcesordm ThePublic-Access Computer Systems Review Vol 8 No 3 available at httpinfolibuheduprv8n3smit8n3html (accessed 28 February 2002)

Smith AG (1999) ordfThe impact of web sites a comparison betwen Australasia and LatinAmericaordm available at wwwvuwacnz agsmithpublnsaustlat (accessed 12 May2001)

Smith AG (2001) ordfApplying evaluation criteria to New Zealand government websitesordmInternational Journal of Information Management Vol 21 No 2 pp 137-49

Tague-Sutcliffe J (1992) ordfAn introduction to informetricsordm Information Processing andManagement Vol 28 No 1 pp 1-3

Tillman HN (2000) ordfEvaluating quality on the Netordm available atwwwtiacnetusershoperegndqualhtml (accessed 5 June 2000)

van Raan AFJ (2001) ordfBibliometrics and Internet some observations and expectationsordmScientometrics Vol 50 No 1 pp 59-63

Vreeland RC (2000) ordfLaw libraries in hyperspacea citation analysis of World Wide Web sitesordmLaw Library Journal Vol 92 No 1 pp 9-25

Zhang P and von Dran GM (2000) ordfSatisregers and dissatisregers a two-factor model for websitedesign and evaluationordm Journal of the American Society for Information Science Vol 51No 14 pp 1253-68

JD595

580

Figure 10Accumulated sitations vsaccumulated Webspaces The data arefrom the second set Theresults are presented fortarget pagesindependently groupedby Web space andexcluding targets thatwere very frequentlylinked to in very largeWeb spaces Sitor pagesare counted but onlytargets aboutExtremadura

JD595

576

Although as this distribution is plotted together with the other distributionsits slope is less clearly perceptible one nevertheless observes that the slopeincreases continuously implying an exponential growth of the accumulatedsitations also Indeed if a logarithmic scale is also used on the vertical axis theresult is a straight line with no point of inmacrection and not regtting Bradfordrsquoslaw It is interesting that with the elimination of this set of links the dependenceof the sitations becomes exponential Again the verbal statement of the law isnot satisreged since the proportion 1nn2 does not hold

Some authors Brookes (1969) for instance consider that Bradfordrsquos law ismainly applicable in small and well-deregned areas of research in which theremust exist a strong thematic relationship between the documents In Figure 8we have therefore represented only the pages or Web spaces that are fromExtremadura (for the regrst set of data) The behaviour however is quite similarto the previous case the regrst two distributions are fairly similar to each otherwith the greatest difference being in the third distribution which is a straightline

Figures 9 and 10 show the same calculations as Figures 7 and 8 but now fordata from the second set The results are identical

ConclusionsThe present experiment focused on the distribution of citations in a closed butheterogeneous environment of thematically related Web spaces using two setsof data We studied how they regt on the one hand a power law distribution andon the other Bradfordrsquos law looking at a great many variants with a total of 384plots

Studies in the literature have shown that on a very large scale sitationsfollow a power law distribution with an exponent of 21 The present sitationdistributions from both data sets were coherent with those earlier studiesincluding in the exponent independently of whether or not the counts includedbroken links total links or only sitor pages and pages or Web spaces both asorigin and as target We showed that the distributions differed howeveraccording to which calculation procedure was used and to what reglter wasapplied to the data In particular when we considered only Web spaces asorigins instead of pages or links and pages as destinations the slope increasedto correspond approximately to an exponent of 28 When however bothdestinations and origins were Web spaces the exponent returned to the value21 Also when the destinations were thematically restricted (in which casethey practically coincided with the citors) the slope was reduced to correspondapproximately to an exponent of 15 Finally the tail in the data was reduced byeliminating the targets that were very frequently linked to in very large Webspaces

We studied the regt to Bradfordrsquos law by plotting the accumulated sitationsagainst the accumulated targets Although we presented many distributions

ordfSitationordmdistributions

577

none of them regtted the typical Bradford case When we considered Web spacesas the sitors the resulting distribution was concave from above with no pointof inmacrection and passing through the origin The other distributions in whichpages were considered as the sitors could in general be characterized as beingcurves which start noticeably above the origin and have several points ofinmacrection ( ) It was also notable regrstly that taking Web spaces asdestinations instead of pages greatly reduced the accumulated sitations andmuch of the complexity of the curve and secondly that eliminating the targetsthat were very frequently linked to in very large Web spaces sharply reducedthe accumulated sitations and the curves changed now to pass through theorigin and presented an exponential dependence on the accumulated sitations

In sum everything seems to support the observation of Kim (2000) that thereare different motivations behind the citations in scientiregc articles and the linksof the World Wide Web In this sense we conregrmed the immense inmacruence ofthematic restriction (for instance nearly all the Web spaces included linksrelated to the World Wide Web itself and its technology) and of the repetitivelinks found in most of the very large Web spaces (which are usually generatedby means of software tools that allow links to be automatically included onevery page)

References

Aguillo IF (2000) ordfIndicadores hacia una evaluacioAcircn no objetiva (cuantitativa) de sedes webordm inVII Jornadas EspanAuml olas de DocumentacioAcircn (Bilbao 19-21 de octubre de 2000) ServicioEditorial de la Universidad del PaotildeAcircs Vasco Bilbao pp 233-48

Almind TC and Ingwersen P (1997) ordfInformetric analyses on the World Wide Webmethodological approaches to `Webometricsrsquoordm Journal of Documentation Vol 53 No 4pp 404-26

Bar-Ilan J (1997) ordfThe `mad cow diseasersquo Usenet newsgroups and bibliometric lawsordmScientometrics Vol 39 No 1 pp 29-55

Bar-Ilan J (2001) ordfData collection methods on the Web for informetrics purposes a review andanalysisordm Scientometrics Vol 50 No 1 pp 7-32

BarabaAcircsi AL and Albert R (1999) ordfEmergence of scaling in random networksordm ScienceVol 286 pp 509-12

BjoEgraverneborn L and Ingwersen P (2001) ordfPerspectives of webometricsordm Scientometrics Vol 50No 1 pp 65-82

Bradford SC (1934) ordfSourcesof information of speciregc subjectsordm Engineering Vol 137 pp 85-6

Brin S and Page L (1998) ordfThe anatomy of a large-scale hypertextual web search engineordmComputer Networks and ISDN Systems Vol 30 pp 107-17

Broder A et al (2000) ordfGraph structure in the Webordm Computer Networks and ISDN SystemsVol 33 No 1-6 pp 309-20

BrookesBC (1969) ordfBradfordrsquos law and the bibliography of scienceordm Nature Vol 224 pp 953-6

Codina L (2000) ordfParaAcircmetros e indicadores de calidad para la evaluacioAcircn de recursos digitalesordmin VII Jornadas EspanAuml olas de DocumentacioAcircn (Bilbao 19-21 de octubre de 2000) ServicioEditorial de la Universidad del PaotildeAcircs Vasco Bilbao pp 135-44

JD595

578

Correa-Uribe G (1999) Colombia Conectada al Mundo Sitios Web Colombianos Universidad deAntioquotildeAcirca AntioquotildeAcirca

Cronin B (2001) ordfBibliometrics and beyond some thoughts on web-based citation analysisordmJournal of Information Science Vol 27 No 1 pp 1-7

Cui L (1999) ordfRating health web sites using the principles of citation analysis a bibliometricapproachordm Journal of Medical Internet Research Vol 1 No e4 avalaible atwwwjmirorg19991e4indexhtm (accessed 18 August 2001)

Cybermetrics International Journal of Scientometrics Informetrics and Bibliometrics (2002)available at wwwcindoccsicescybermetrics (accessed 11 August 2002)

Egghe L (2000) ordfNew informetric aspects of the Internet some remacrections plusmn many problemsordmJournal of Information Science Vol 26 No 5 pp 329-35

Ferreiro-AlaAcircez L (1981) ordfAnaAcirclisis de referencias y caracterotildeAcircsticas bibliomeAcirctricas de losconjuntos de revistas nuclearesordm Revista EspanAuml ola de DocumentacioAcircn CientotildeAcircregca Vol 4No 3 pp 181-98

Gorbea-Portal S (1996) El Modelo MatemaAcirctico de Bradford Su AplicacioAcircn a las RevistasLatinoamericanas de las Ciencias BibliotecoloAcircgicas y de la InformacioAcircn UNAM CentroUniversitario de Investigaciones BibliotecoloAcircgicas MeAcircxico

Gupta DK (1991) ordfApplication of Bradfordrsquos law to citation data of Ethiopian medicaljournalsordm Annals of Library Science and Documentation Vol 38 No 3 pp 85-98

Harter SP and Ford CE (2000) ordfWeb-based analyses of e-journal impact approachesproblems and issuesordm Library Science With a Slant to Documentation and InformationStudies Vol 51 No 13 pp 1159-76

Houston W (1983) ordfThe application of bibliometrics to veterinary science primary literatureordmQuarterly Bulletin of International Association of Agricultural Information SpecialistsVol 28 No 1 pp 6-13

Internet Software Consortium (2002) ordfInternet domain surveyordm available at wwwiscorg(accessed 4 November 2002)

JimeAcircnez-Piano M (2001) ordfEvaluacioAcircn de sedes webordm Revista EspanAuml ola de DocumentacioAcircnCientotildeAcircregca Vol 24 No 4 pp 405-32

Kim HJ (2000) ordfMotivations for hyperlinking in scholarly electronic articles a qualitativestudyordm Journal of the American Society for Information Science Vol 51 No 10 pp 887-99

Kumar R et al (2001) ordfTrawling the Web for emerging cyber-communitiesordm in Mendelzon A etal (Eds) Proceedings of the 8th International World Wide Web Conference (TorontoCanadaAcirc May 11-14 1999) available at www8orgw8-papers4a-search-miningtrawlingtrawlinghtml (accessed 2 October 2002) Elsevier Amsterdam

Lal A and Panda KC (1999) ordfBradfordrsquos law and its application to bibliographical data ofplant pathology dissertations an analytical approachordm Library Science With a Slant toDocumentation and Information Studies Vol 36 No 3 pp 193-206

Larson RR (1996) ordfBibliometrics of the World Wide Web and exploratory analysis of theintellectual structure of cyberspaceordm in Hardin S (Ed) Proceedings of the 59th AnnualMeeting of the American Society for Information Science (Baltimore Maryland 1996)Information Today Medford NJ pp 71-8 available at httpsherlockberkeleyeduasis96asis96html (accessed 14 October 2000)

McKiernan G (1996) ordfCitedSites(sm) citation indexing of Web resourcesordm available atwwwpubliciastateedu CYBERSTACKSCitedhtm (accessed 24 February 2000)

Mubeen MA (1996) ordfCitation analysis of doctoral dissertations in chemistryordm Annals of LibraryScience and Documentation Vol 43 No 2 pp 48-58

ordfSitationordmdistributions

579

Price DJ (1970) ordfCitation measures of hard science soft science technology and non-scienceordm inNelson CC and Pollock DE (Eds) Communication Among Scientists and Engineers DCHealth and Co Lexington MA pp 3-22

Reyes-BarragaAcircn MJ et al (2000) ordfRevistas cientotildeAcircregcas determinacioAcircn de necesidades y usosordmRevista EspanAuml ola de DocumentacioAcircn CientotildeAcircregca Vol 23 No 4 pp 417-36

RodrotildeAcircguez i GairotildeAcircn JM (1997) ordfValoracioAcircn del impacto de la informacioAcircn en Internet Altavistael `Citation Indexrsquo de la redordm Revista EspanAuml ola de DocumentacioAcircn CientotildeAcircregca Vol 20 No 2pp 175-81

Rousseau R (1997) ordfCitations an exploratory studyordm Cybermetrics International Journal ofScientometrics Informetrics and Bibliometrics Vol 1 p 1 available atwwwcindoccsicescybermetricsarticlesv1i1p1html (accessed 5 September 2000)

Smith AG (1996) ordfCriteria for evaluation of Internet information resourcesordm available atwwwvuwacnz agsmithevalnindexhtm (accessed 25 March 2000)

Smith AG (1997) ordfTesting the surf criteria for evaluating Internet information resourcesordm ThePublic-Access Computer Systems Review Vol 8 No 3 available at httpinfolibuheduprv8n3smit8n3html (accessed 28 February 2002)

Smith AG (1999) ordfThe impact of web sites a comparison betwen Australasia and LatinAmericaordm available at wwwvuwacnz agsmithpublnsaustlat (accessed 12 May2001)

Smith AG (2001) ordfApplying evaluation criteria to New Zealand government websitesordmInternational Journal of Information Management Vol 21 No 2 pp 137-49

Tague-Sutcliffe J (1992) ordfAn introduction to informetricsordm Information Processing andManagement Vol 28 No 1 pp 1-3

Tillman HN (2000) ordfEvaluating quality on the Netordm available atwwwtiacnetusershoperegndqualhtml (accessed 5 June 2000)

van Raan AFJ (2001) ordfBibliometrics and Internet some observations and expectationsordmScientometrics Vol 50 No 1 pp 59-63

Vreeland RC (2000) ordfLaw libraries in hyperspacea citation analysis of World Wide Web sitesordmLaw Library Journal Vol 92 No 1 pp 9-25

Zhang P and von Dran GM (2000) ordfSatisregers and dissatisregers a two-factor model for websitedesign and evaluationordm Journal of the American Society for Information Science Vol 51No 14 pp 1253-68

JD595

580

Although as this distribution is plotted together with the other distributionsits slope is less clearly perceptible one nevertheless observes that the slopeincreases continuously implying an exponential growth of the accumulatedsitations also Indeed if a logarithmic scale is also used on the vertical axis theresult is a straight line with no point of inmacrection and not regtting Bradfordrsquoslaw It is interesting that with the elimination of this set of links the dependenceof the sitations becomes exponential Again the verbal statement of the law isnot satisreged since the proportion 1nn2 does not hold

Some authors Brookes (1969) for instance consider that Bradfordrsquos law ismainly applicable in small and well-deregned areas of research in which theremust exist a strong thematic relationship between the documents In Figure 8we have therefore represented only the pages or Web spaces that are fromExtremadura (for the regrst set of data) The behaviour however is quite similarto the previous case the regrst two distributions are fairly similar to each otherwith the greatest difference being in the third distribution which is a straightline

Figures 9 and 10 show the same calculations as Figures 7 and 8 but now fordata from the second set The results are identical

ConclusionsThe present experiment focused on the distribution of citations in a closed butheterogeneous environment of thematically related Web spaces using two setsof data We studied how they regt on the one hand a power law distribution andon the other Bradfordrsquos law looking at a great many variants with a total of 384plots

Studies in the literature have shown that on a very large scale sitationsfollow a power law distribution with an exponent of 21 The present sitationdistributions from both data sets were coherent with those earlier studiesincluding in the exponent independently of whether or not the counts includedbroken links total links or only sitor pages and pages or Web spaces both asorigin and as target We showed that the distributions differed howeveraccording to which calculation procedure was used and to what reglter wasapplied to the data In particular when we considered only Web spaces asorigins instead of pages or links and pages as destinations the slope increasedto correspond approximately to an exponent of 28 When however bothdestinations and origins were Web spaces the exponent returned to the value21 Also when the destinations were thematically restricted (in which casethey practically coincided with the citors) the slope was reduced to correspondapproximately to an exponent of 15 Finally the tail in the data was reduced byeliminating the targets that were very frequently linked to in very large Webspaces

We studied the regt to Bradfordrsquos law by plotting the accumulated sitationsagainst the accumulated targets Although we presented many distributions

ordfSitationordmdistributions

577

none of them regtted the typical Bradford case When we considered Web spacesas the sitors the resulting distribution was concave from above with no pointof inmacrection and passing through the origin The other distributions in whichpages were considered as the sitors could in general be characterized as beingcurves which start noticeably above the origin and have several points ofinmacrection ( ) It was also notable regrstly that taking Web spaces asdestinations instead of pages greatly reduced the accumulated sitations andmuch of the complexity of the curve and secondly that eliminating the targetsthat were very frequently linked to in very large Web spaces sharply reducedthe accumulated sitations and the curves changed now to pass through theorigin and presented an exponential dependence on the accumulated sitations

In sum everything seems to support the observation of Kim (2000) that thereare different motivations behind the citations in scientiregc articles and the linksof the World Wide Web In this sense we conregrmed the immense inmacruence ofthematic restriction (for instance nearly all the Web spaces included linksrelated to the World Wide Web itself and its technology) and of the repetitivelinks found in most of the very large Web spaces (which are usually generatedby means of software tools that allow links to be automatically included onevery page)

References

Aguillo IF (2000) ordfIndicadores hacia una evaluacioAcircn no objetiva (cuantitativa) de sedes webordm inVII Jornadas EspanAuml olas de DocumentacioAcircn (Bilbao 19-21 de octubre de 2000) ServicioEditorial de la Universidad del PaotildeAcircs Vasco Bilbao pp 233-48

Almind TC and Ingwersen P (1997) ordfInformetric analyses on the World Wide Webmethodological approaches to `Webometricsrsquoordm Journal of Documentation Vol 53 No 4pp 404-26

Bar-Ilan J (1997) ordfThe `mad cow diseasersquo Usenet newsgroups and bibliometric lawsordmScientometrics Vol 39 No 1 pp 29-55

Bar-Ilan J (2001) ordfData collection methods on the Web for informetrics purposes a review andanalysisordm Scientometrics Vol 50 No 1 pp 7-32

BarabaAcircsi AL and Albert R (1999) ordfEmergence of scaling in random networksordm ScienceVol 286 pp 509-12

BjoEgraverneborn L and Ingwersen P (2001) ordfPerspectives of webometricsordm Scientometrics Vol 50No 1 pp 65-82

Bradford SC (1934) ordfSourcesof information of speciregc subjectsordm Engineering Vol 137 pp 85-6

Brin S and Page L (1998) ordfThe anatomy of a large-scale hypertextual web search engineordmComputer Networks and ISDN Systems Vol 30 pp 107-17

Broder A et al (2000) ordfGraph structure in the Webordm Computer Networks and ISDN SystemsVol 33 No 1-6 pp 309-20

BrookesBC (1969) ordfBradfordrsquos law and the bibliography of scienceordm Nature Vol 224 pp 953-6

Codina L (2000) ordfParaAcircmetros e indicadores de calidad para la evaluacioAcircn de recursos digitalesordmin VII Jornadas EspanAuml olas de DocumentacioAcircn (Bilbao 19-21 de octubre de 2000) ServicioEditorial de la Universidad del PaotildeAcircs Vasco Bilbao pp 135-44

JD595

578

Correa-Uribe G (1999) Colombia Conectada al Mundo Sitios Web Colombianos Universidad deAntioquotildeAcirca AntioquotildeAcirca

Cronin B (2001) ordfBibliometrics and beyond some thoughts on web-based citation analysisordmJournal of Information Science Vol 27 No 1 pp 1-7

Cui L (1999) ordfRating health web sites using the principles of citation analysis a bibliometricapproachordm Journal of Medical Internet Research Vol 1 No e4 avalaible atwwwjmirorg19991e4indexhtm (accessed 18 August 2001)

Cybermetrics International Journal of Scientometrics Informetrics and Bibliometrics (2002)available at wwwcindoccsicescybermetrics (accessed 11 August 2002)

Egghe L (2000) ordfNew informetric aspects of the Internet some remacrections plusmn many problemsordmJournal of Information Science Vol 26 No 5 pp 329-35

Ferreiro-AlaAcircez L (1981) ordfAnaAcirclisis de referencias y caracterotildeAcircsticas bibliomeAcirctricas de losconjuntos de revistas nuclearesordm Revista EspanAuml ola de DocumentacioAcircn CientotildeAcircregca Vol 4No 3 pp 181-98

Gorbea-Portal S (1996) El Modelo MatemaAcirctico de Bradford Su AplicacioAcircn a las RevistasLatinoamericanas de las Ciencias BibliotecoloAcircgicas y de la InformacioAcircn UNAM CentroUniversitario de Investigaciones BibliotecoloAcircgicas MeAcircxico

Gupta DK (1991) ordfApplication of Bradfordrsquos law to citation data of Ethiopian medicaljournalsordm Annals of Library Science and Documentation Vol 38 No 3 pp 85-98

Harter SP and Ford CE (2000) ordfWeb-based analyses of e-journal impact approachesproblems and issuesordm Library Science With a Slant to Documentation and InformationStudies Vol 51 No 13 pp 1159-76

Houston W (1983) ordfThe application of bibliometrics to veterinary science primary literatureordmQuarterly Bulletin of International Association of Agricultural Information SpecialistsVol 28 No 1 pp 6-13

Internet Software Consortium (2002) ordfInternet domain surveyordm available at wwwiscorg(accessed 4 November 2002)

JimeAcircnez-Piano M (2001) ordfEvaluacioAcircn de sedes webordm Revista EspanAuml ola de DocumentacioAcircnCientotildeAcircregca Vol 24 No 4 pp 405-32

Kim HJ (2000) ordfMotivations for hyperlinking in scholarly electronic articles a qualitativestudyordm Journal of the American Society for Information Science Vol 51 No 10 pp 887-99

Kumar R et al (2001) ordfTrawling the Web for emerging cyber-communitiesordm in Mendelzon A etal (Eds) Proceedings of the 8th International World Wide Web Conference (TorontoCanadaAcirc May 11-14 1999) available at www8orgw8-papers4a-search-miningtrawlingtrawlinghtml (accessed 2 October 2002) Elsevier Amsterdam

Lal A and Panda KC (1999) ordfBradfordrsquos law and its application to bibliographical data ofplant pathology dissertations an analytical approachordm Library Science With a Slant toDocumentation and Information Studies Vol 36 No 3 pp 193-206

Larson RR (1996) ordfBibliometrics of the World Wide Web and exploratory analysis of theintellectual structure of cyberspaceordm in Hardin S (Ed) Proceedings of the 59th AnnualMeeting of the American Society for Information Science (Baltimore Maryland 1996)Information Today Medford NJ pp 71-8 available at httpsherlockberkeleyeduasis96asis96html (accessed 14 October 2000)

McKiernan G (1996) ordfCitedSites(sm) citation indexing of Web resourcesordm available atwwwpubliciastateedu CYBERSTACKSCitedhtm (accessed 24 February 2000)

Mubeen MA (1996) ordfCitation analysis of doctoral dissertations in chemistryordm Annals of LibraryScience and Documentation Vol 43 No 2 pp 48-58

ordfSitationordmdistributions

579

Price DJ (1970) ordfCitation measures of hard science soft science technology and non-scienceordm inNelson CC and Pollock DE (Eds) Communication Among Scientists and Engineers DCHealth and Co Lexington MA pp 3-22

Reyes-BarragaAcircn MJ et al (2000) ordfRevistas cientotildeAcircregcas determinacioAcircn de necesidades y usosordmRevista EspanAuml ola de DocumentacioAcircn CientotildeAcircregca Vol 23 No 4 pp 417-36

RodrotildeAcircguez i GairotildeAcircn JM (1997) ordfValoracioAcircn del impacto de la informacioAcircn en Internet Altavistael `Citation Indexrsquo de la redordm Revista EspanAuml ola de DocumentacioAcircn CientotildeAcircregca Vol 20 No 2pp 175-81

Rousseau R (1997) ordfCitations an exploratory studyordm Cybermetrics International Journal ofScientometrics Informetrics and Bibliometrics Vol 1 p 1 available atwwwcindoccsicescybermetricsarticlesv1i1p1html (accessed 5 September 2000)

Smith AG (1996) ordfCriteria for evaluation of Internet information resourcesordm available atwwwvuwacnz agsmithevalnindexhtm (accessed 25 March 2000)

Smith AG (1997) ordfTesting the surf criteria for evaluating Internet information resourcesordm ThePublic-Access Computer Systems Review Vol 8 No 3 available at httpinfolibuheduprv8n3smit8n3html (accessed 28 February 2002)

Smith AG (1999) ordfThe impact of web sites a comparison betwen Australasia and LatinAmericaordm available at wwwvuwacnz agsmithpublnsaustlat (accessed 12 May2001)

Smith AG (2001) ordfApplying evaluation criteria to New Zealand government websitesordmInternational Journal of Information Management Vol 21 No 2 pp 137-49

Tague-Sutcliffe J (1992) ordfAn introduction to informetricsordm Information Processing andManagement Vol 28 No 1 pp 1-3

Tillman HN (2000) ordfEvaluating quality on the Netordm available atwwwtiacnetusershoperegndqualhtml (accessed 5 June 2000)

van Raan AFJ (2001) ordfBibliometrics and Internet some observations and expectationsordmScientometrics Vol 50 No 1 pp 59-63

Vreeland RC (2000) ordfLaw libraries in hyperspacea citation analysis of World Wide Web sitesordmLaw Library Journal Vol 92 No 1 pp 9-25

Zhang P and von Dran GM (2000) ordfSatisregers and dissatisregers a two-factor model for websitedesign and evaluationordm Journal of the American Society for Information Science Vol 51No 14 pp 1253-68

JD595

580

none of them regtted the typical Bradford case When we considered Web spacesas the sitors the resulting distribution was concave from above with no pointof inmacrection and passing through the origin The other distributions in whichpages were considered as the sitors could in general be characterized as beingcurves which start noticeably above the origin and have several points ofinmacrection ( ) It was also notable regrstly that taking Web spaces asdestinations instead of pages greatly reduced the accumulated sitations andmuch of the complexity of the curve and secondly that eliminating the targetsthat were very frequently linked to in very large Web spaces sharply reducedthe accumulated sitations and the curves changed now to pass through theorigin and presented an exponential dependence on the accumulated sitations

In sum everything seems to support the observation of Kim (2000) that thereare different motivations behind the citations in scientiregc articles and the linksof the World Wide Web In this sense we conregrmed the immense inmacruence ofthematic restriction (for instance nearly all the Web spaces included linksrelated to the World Wide Web itself and its technology) and of the repetitivelinks found in most of the very large Web spaces (which are usually generatedby means of software tools that allow links to be automatically included onevery page)

References

Aguillo IF (2000) ordfIndicadores hacia una evaluacioAcircn no objetiva (cuantitativa) de sedes webordm inVII Jornadas EspanAuml olas de DocumentacioAcircn (Bilbao 19-21 de octubre de 2000) ServicioEditorial de la Universidad del PaotildeAcircs Vasco Bilbao pp 233-48

Almind TC and Ingwersen P (1997) ordfInformetric analyses on the World Wide Webmethodological approaches to `Webometricsrsquoordm Journal of Documentation Vol 53 No 4pp 404-26

Bar-Ilan J (1997) ordfThe `mad cow diseasersquo Usenet newsgroups and bibliometric lawsordmScientometrics Vol 39 No 1 pp 29-55

Bar-Ilan J (2001) ordfData collection methods on the Web for informetrics purposes a review andanalysisordm Scientometrics Vol 50 No 1 pp 7-32

BarabaAcircsi AL and Albert R (1999) ordfEmergence of scaling in random networksordm ScienceVol 286 pp 509-12

BjoEgraverneborn L and Ingwersen P (2001) ordfPerspectives of webometricsordm Scientometrics Vol 50No 1 pp 65-82

Bradford SC (1934) ordfSourcesof information of speciregc subjectsordm Engineering Vol 137 pp 85-6

Brin S and Page L (1998) ordfThe anatomy of a large-scale hypertextual web search engineordmComputer Networks and ISDN Systems Vol 30 pp 107-17

Broder A et al (2000) ordfGraph structure in the Webordm Computer Networks and ISDN SystemsVol 33 No 1-6 pp 309-20

BrookesBC (1969) ordfBradfordrsquos law and the bibliography of scienceordm Nature Vol 224 pp 953-6

Codina L (2000) ordfParaAcircmetros e indicadores de calidad para la evaluacioAcircn de recursos digitalesordmin VII Jornadas EspanAuml olas de DocumentacioAcircn (Bilbao 19-21 de octubre de 2000) ServicioEditorial de la Universidad del PaotildeAcircs Vasco Bilbao pp 135-44

JD595

578

Correa-Uribe G (1999) Colombia Conectada al Mundo Sitios Web Colombianos Universidad deAntioquotildeAcirca AntioquotildeAcirca

Cronin B (2001) ordfBibliometrics and beyond some thoughts on web-based citation analysisordmJournal of Information Science Vol 27 No 1 pp 1-7

Cui L (1999) ordfRating health web sites using the principles of citation analysis a bibliometricapproachordm Journal of Medical Internet Research Vol 1 No e4 avalaible atwwwjmirorg19991e4indexhtm (accessed 18 August 2001)

Cybermetrics International Journal of Scientometrics Informetrics and Bibliometrics (2002)available at wwwcindoccsicescybermetrics (accessed 11 August 2002)

Egghe L (2000) ordfNew informetric aspects of the Internet some remacrections plusmn many problemsordmJournal of Information Science Vol 26 No 5 pp 329-35

Ferreiro-AlaAcircez L (1981) ordfAnaAcirclisis de referencias y caracterotildeAcircsticas bibliomeAcirctricas de losconjuntos de revistas nuclearesordm Revista EspanAuml ola de DocumentacioAcircn CientotildeAcircregca Vol 4No 3 pp 181-98

Gorbea-Portal S (1996) El Modelo MatemaAcirctico de Bradford Su AplicacioAcircn a las RevistasLatinoamericanas de las Ciencias BibliotecoloAcircgicas y de la InformacioAcircn UNAM CentroUniversitario de Investigaciones BibliotecoloAcircgicas MeAcircxico

Gupta DK (1991) ordfApplication of Bradfordrsquos law to citation data of Ethiopian medicaljournalsordm Annals of Library Science and Documentation Vol 38 No 3 pp 85-98

Harter SP and Ford CE (2000) ordfWeb-based analyses of e-journal impact approachesproblems and issuesordm Library Science With a Slant to Documentation and InformationStudies Vol 51 No 13 pp 1159-76

Houston W (1983) ordfThe application of bibliometrics to veterinary science primary literatureordmQuarterly Bulletin of International Association of Agricultural Information SpecialistsVol 28 No 1 pp 6-13

Internet Software Consortium (2002) ordfInternet domain surveyordm available at wwwiscorg(accessed 4 November 2002)

JimeAcircnez-Piano M (2001) ordfEvaluacioAcircn de sedes webordm Revista EspanAuml ola de DocumentacioAcircnCientotildeAcircregca Vol 24 No 4 pp 405-32

Kim HJ (2000) ordfMotivations for hyperlinking in scholarly electronic articles a qualitativestudyordm Journal of the American Society for Information Science Vol 51 No 10 pp 887-99

Kumar R et al (2001) ordfTrawling the Web for emerging cyber-communitiesordm in Mendelzon A etal (Eds) Proceedings of the 8th International World Wide Web Conference (TorontoCanadaAcirc May 11-14 1999) available at www8orgw8-papers4a-search-miningtrawlingtrawlinghtml (accessed 2 October 2002) Elsevier Amsterdam

Lal A and Panda KC (1999) ordfBradfordrsquos law and its application to bibliographical data ofplant pathology dissertations an analytical approachordm Library Science With a Slant toDocumentation and Information Studies Vol 36 No 3 pp 193-206

Larson RR (1996) ordfBibliometrics of the World Wide Web and exploratory analysis of theintellectual structure of cyberspaceordm in Hardin S (Ed) Proceedings of the 59th AnnualMeeting of the American Society for Information Science (Baltimore Maryland 1996)Information Today Medford NJ pp 71-8 available at httpsherlockberkeleyeduasis96asis96html (accessed 14 October 2000)

McKiernan G (1996) ordfCitedSites(sm) citation indexing of Web resourcesordm available atwwwpubliciastateedu CYBERSTACKSCitedhtm (accessed 24 February 2000)

Mubeen MA (1996) ordfCitation analysis of doctoral dissertations in chemistryordm Annals of LibraryScience and Documentation Vol 43 No 2 pp 48-58

ordfSitationordmdistributions

579

Price DJ (1970) ordfCitation measures of hard science soft science technology and non-scienceordm inNelson CC and Pollock DE (Eds) Communication Among Scientists and Engineers DCHealth and Co Lexington MA pp 3-22

Reyes-BarragaAcircn MJ et al (2000) ordfRevistas cientotildeAcircregcas determinacioAcircn de necesidades y usosordmRevista EspanAuml ola de DocumentacioAcircn CientotildeAcircregca Vol 23 No 4 pp 417-36

RodrotildeAcircguez i GairotildeAcircn JM (1997) ordfValoracioAcircn del impacto de la informacioAcircn en Internet Altavistael `Citation Indexrsquo de la redordm Revista EspanAuml ola de DocumentacioAcircn CientotildeAcircregca Vol 20 No 2pp 175-81

Rousseau R (1997) ordfCitations an exploratory studyordm Cybermetrics International Journal ofScientometrics Informetrics and Bibliometrics Vol 1 p 1 available atwwwcindoccsicescybermetricsarticlesv1i1p1html (accessed 5 September 2000)

Smith AG (1996) ordfCriteria for evaluation of Internet information resourcesordm available atwwwvuwacnz agsmithevalnindexhtm (accessed 25 March 2000)

Smith AG (1997) ordfTesting the surf criteria for evaluating Internet information resourcesordm ThePublic-Access Computer Systems Review Vol 8 No 3 available at httpinfolibuheduprv8n3smit8n3html (accessed 28 February 2002)

Smith AG (1999) ordfThe impact of web sites a comparison betwen Australasia and LatinAmericaordm available at wwwvuwacnz agsmithpublnsaustlat (accessed 12 May2001)

Smith AG (2001) ordfApplying evaluation criteria to New Zealand government websitesordmInternational Journal of Information Management Vol 21 No 2 pp 137-49

Tague-Sutcliffe J (1992) ordfAn introduction to informetricsordm Information Processing andManagement Vol 28 No 1 pp 1-3

Tillman HN (2000) ordfEvaluating quality on the Netordm available atwwwtiacnetusershoperegndqualhtml (accessed 5 June 2000)

van Raan AFJ (2001) ordfBibliometrics and Internet some observations and expectationsordmScientometrics Vol 50 No 1 pp 59-63

Vreeland RC (2000) ordfLaw libraries in hyperspacea citation analysis of World Wide Web sitesordmLaw Library Journal Vol 92 No 1 pp 9-25

Zhang P and von Dran GM (2000) ordfSatisregers and dissatisregers a two-factor model for websitedesign and evaluationordm Journal of the American Society for Information Science Vol 51No 14 pp 1253-68

JD595

580

Correa-Uribe G (1999) Colombia Conectada al Mundo Sitios Web Colombianos Universidad deAntioquotildeAcirca AntioquotildeAcirca

Cronin B (2001) ordfBibliometrics and beyond some thoughts on web-based citation analysisordmJournal of Information Science Vol 27 No 1 pp 1-7

Cui L (1999) ordfRating health web sites using the principles of citation analysis a bibliometricapproachordm Journal of Medical Internet Research Vol 1 No e4 avalaible atwwwjmirorg19991e4indexhtm (accessed 18 August 2001)

Cybermetrics International Journal of Scientometrics Informetrics and Bibliometrics (2002)available at wwwcindoccsicescybermetrics (accessed 11 August 2002)

Egghe L (2000) ordfNew informetric aspects of the Internet some remacrections plusmn many problemsordmJournal of Information Science Vol 26 No 5 pp 329-35

Ferreiro-AlaAcircez L (1981) ordfAnaAcirclisis de referencias y caracterotildeAcircsticas bibliomeAcirctricas de losconjuntos de revistas nuclearesordm Revista EspanAuml ola de DocumentacioAcircn CientotildeAcircregca Vol 4No 3 pp 181-98

Gorbea-Portal S (1996) El Modelo MatemaAcirctico de Bradford Su AplicacioAcircn a las RevistasLatinoamericanas de las Ciencias BibliotecoloAcircgicas y de la InformacioAcircn UNAM CentroUniversitario de Investigaciones BibliotecoloAcircgicas MeAcircxico

Gupta DK (1991) ordfApplication of Bradfordrsquos law to citation data of Ethiopian medicaljournalsordm Annals of Library Science and Documentation Vol 38 No 3 pp 85-98

Harter SP and Ford CE (2000) ordfWeb-based analyses of e-journal impact approachesproblems and issuesordm Library Science With a Slant to Documentation and InformationStudies Vol 51 No 13 pp 1159-76

Houston W (1983) ordfThe application of bibliometrics to veterinary science primary literatureordmQuarterly Bulletin of International Association of Agricultural Information SpecialistsVol 28 No 1 pp 6-13

Internet Software Consortium (2002) ordfInternet domain surveyordm available at wwwiscorg(accessed 4 November 2002)

JimeAcircnez-Piano M (2001) ordfEvaluacioAcircn de sedes webordm Revista EspanAuml ola de DocumentacioAcircnCientotildeAcircregca Vol 24 No 4 pp 405-32

Kim HJ (2000) ordfMotivations for hyperlinking in scholarly electronic articles a qualitativestudyordm Journal of the American Society for Information Science Vol 51 No 10 pp 887-99

Kumar R et al (2001) ordfTrawling the Web for emerging cyber-communitiesordm in Mendelzon A etal (Eds) Proceedings of the 8th International World Wide Web Conference (TorontoCanadaAcirc May 11-14 1999) available at www8orgw8-papers4a-search-miningtrawlingtrawlinghtml (accessed 2 October 2002) Elsevier Amsterdam

Lal A and Panda KC (1999) ordfBradfordrsquos law and its application to bibliographical data ofplant pathology dissertations an analytical approachordm Library Science With a Slant toDocumentation and Information Studies Vol 36 No 3 pp 193-206

Larson RR (1996) ordfBibliometrics of the World Wide Web and exploratory analysis of theintellectual structure of cyberspaceordm in Hardin S (Ed) Proceedings of the 59th AnnualMeeting of the American Society for Information Science (Baltimore Maryland 1996)Information Today Medford NJ pp 71-8 available at httpsherlockberkeleyeduasis96asis96html (accessed 14 October 2000)

McKiernan G (1996) ordfCitedSites(sm) citation indexing of Web resourcesordm available atwwwpubliciastateedu CYBERSTACKSCitedhtm (accessed 24 February 2000)

Mubeen MA (1996) ordfCitation analysis of doctoral dissertations in chemistryordm Annals of LibraryScience and Documentation Vol 43 No 2 pp 48-58

ordfSitationordmdistributions

579

Price DJ (1970) ordfCitation measures of hard science soft science technology and non-scienceordm inNelson CC and Pollock DE (Eds) Communication Among Scientists and Engineers DCHealth and Co Lexington MA pp 3-22

Reyes-BarragaAcircn MJ et al (2000) ordfRevistas cientotildeAcircregcas determinacioAcircn de necesidades y usosordmRevista EspanAuml ola de DocumentacioAcircn CientotildeAcircregca Vol 23 No 4 pp 417-36

RodrotildeAcircguez i GairotildeAcircn JM (1997) ordfValoracioAcircn del impacto de la informacioAcircn en Internet Altavistael `Citation Indexrsquo de la redordm Revista EspanAuml ola de DocumentacioAcircn CientotildeAcircregca Vol 20 No 2pp 175-81

Rousseau R (1997) ordfCitations an exploratory studyordm Cybermetrics International Journal ofScientometrics Informetrics and Bibliometrics Vol 1 p 1 available atwwwcindoccsicescybermetricsarticlesv1i1p1html (accessed 5 September 2000)

Smith AG (1996) ordfCriteria for evaluation of Internet information resourcesordm available atwwwvuwacnz agsmithevalnindexhtm (accessed 25 March 2000)

Smith AG (1997) ordfTesting the surf criteria for evaluating Internet information resourcesordm ThePublic-Access Computer Systems Review Vol 8 No 3 available at httpinfolibuheduprv8n3smit8n3html (accessed 28 February 2002)

Smith AG (1999) ordfThe impact of web sites a comparison betwen Australasia and LatinAmericaordm available at wwwvuwacnz agsmithpublnsaustlat (accessed 12 May2001)

Smith AG (2001) ordfApplying evaluation criteria to New Zealand government websitesordmInternational Journal of Information Management Vol 21 No 2 pp 137-49

Tague-Sutcliffe J (1992) ordfAn introduction to informetricsordm Information Processing andManagement Vol 28 No 1 pp 1-3

Tillman HN (2000) ordfEvaluating quality on the Netordm available atwwwtiacnetusershoperegndqualhtml (accessed 5 June 2000)

van Raan AFJ (2001) ordfBibliometrics and Internet some observations and expectationsordmScientometrics Vol 50 No 1 pp 59-63

Vreeland RC (2000) ordfLaw libraries in hyperspacea citation analysis of World Wide Web sitesordmLaw Library Journal Vol 92 No 1 pp 9-25

Zhang P and von Dran GM (2000) ordfSatisregers and dissatisregers a two-factor model for websitedesign and evaluationordm Journal of the American Society for Information Science Vol 51No 14 pp 1253-68

JD595

580

Price DJ (1970) ordfCitation measures of hard science soft science technology and non-scienceordm inNelson CC and Pollock DE (Eds) Communication Among Scientists and Engineers DCHealth and Co Lexington MA pp 3-22

Reyes-BarragaAcircn MJ et al (2000) ordfRevistas cientotildeAcircregcas determinacioAcircn de necesidades y usosordmRevista EspanAuml ola de DocumentacioAcircn CientotildeAcircregca Vol 23 No 4 pp 417-36

RodrotildeAcircguez i GairotildeAcircn JM (1997) ordfValoracioAcircn del impacto de la informacioAcircn en Internet Altavistael `Citation Indexrsquo de la redordm Revista EspanAuml ola de DocumentacioAcircn CientotildeAcircregca Vol 20 No 2pp 175-81

Rousseau R (1997) ordfCitations an exploratory studyordm Cybermetrics International Journal ofScientometrics Informetrics and Bibliometrics Vol 1 p 1 available atwwwcindoccsicescybermetricsarticlesv1i1p1html (accessed 5 September 2000)

Smith AG (1996) ordfCriteria for evaluation of Internet information resourcesordm available atwwwvuwacnz agsmithevalnindexhtm (accessed 25 March 2000)

Smith AG (1997) ordfTesting the surf criteria for evaluating Internet information resourcesordm ThePublic-Access Computer Systems Review Vol 8 No 3 available at httpinfolibuheduprv8n3smit8n3html (accessed 28 February 2002)

Smith AG (1999) ordfThe impact of web sites a comparison betwen Australasia and LatinAmericaordm available at wwwvuwacnz agsmithpublnsaustlat (accessed 12 May2001)

Smith AG (2001) ordfApplying evaluation criteria to New Zealand government websitesordmInternational Journal of Information Management Vol 21 No 2 pp 137-49

Tague-Sutcliffe J (1992) ordfAn introduction to informetricsordm Information Processing andManagement Vol 28 No 1 pp 1-3

Tillman HN (2000) ordfEvaluating quality on the Netordm available atwwwtiacnetusershoperegndqualhtml (accessed 5 June 2000)

van Raan AFJ (2001) ordfBibliometrics and Internet some observations and expectationsordmScientometrics Vol 50 No 1 pp 59-63

Vreeland RC (2000) ordfLaw libraries in hyperspacea citation analysis of World Wide Web sitesordmLaw Library Journal Vol 92 No 1 pp 9-25

Zhang P and von Dran GM (2000) ordfSatisregers and dissatisregers a two-factor model for websitedesign and evaluationordm Journal of the American Society for Information Science Vol 51No 14 pp 1253-68

JD595

580