13
A Scientometric Analysis of Cloud Computing Literature Leonard Heilig and Stefan Voß Abstract—The popularity and rapid development of cloud computing in recent years has led to a huge amount of publications containing the achieved knowledge of this area of research. Due to the interdisciplinary nature and high relevance of cloud computing research, it becomes increasingly difcult or even impossible to understand the overall structure and development of this eld without analytical approaches. While evaluating science has a long tradition in many elds, we identify a lack of a comprehensive scientometric study in the area of cloud computing. Based on a large bibliographic data base, this study applies scientometric means to empirically study the evolution and state of cloud computing research with a view from above the clouds. By this, we provide extensive insights into publication patterns, research impact and research productivity. Furthermore, we explore the interplay of related subtopics by analyzing keyword clusters. The results of this study provide a better understanding of patterns, trends and other important factors as a basis for directing research activities, sharing knowledge and collaborating in the area of cloud computing research. Index Terms Cloud computing, cloud computing research, scientometric analysis, scientometrics, keyword cluster analysis Ç 1 INTRODUCTION A LTHOUGH cloud computing is a relatively young eld of  research, the great inter est in academia and prac ti ce has led to a considerable amount of publications in recent years. The interdisciplinary nature as well as technical and non- technical potentials and challenges of cloud computing (e.g., discussed in [1], [2], [3], [4]) are some of the main reasons for the rapid development. Given the signicantly increasing number of publications, it becomes more and more impor- tant to investigate the current state and evolution of cloud compu ting resear ch. Quanti tative studies measur ing and analyzing science activities form a type of research com- monl y known as scient omet ri cs [5]. By providing a view on a research eld from a meta-perspective [6], [7], scientometric studies facilitate the development and improvement of an academic discipline [5], [8] serving as a vital basis for den- ing and debati ng future resear ch agenda s [9]. Assumi ng that scientic activities are reected through scientic publica- tions, sci ent ometric stud ies app ly empiri cal mea sures to ana - lyze scientic output of a specic eld in order to better understand the dynamics and structure of its development. Thereby, it is possible to explore the body of publications extensi vely, for exampl e, to observ e citati on patter ns, num-  ber and types of citations, number and structure of authors and so forth. Going further , a scien tome tric study gi ves some indication of rese arc h act ivities in genera l, suc h as wit h respect to kno wled ge sha ring, resear ch qua lit y, soc io- organi- zational struct ure s, inuential countr ies /af li ations / authors, development of key topics, structural change, and economic impact of research. For further reading see, e.g., [6], [8], [10], [11], [12], [13]. Moreover, scientometrics, as an evaluation tool of science, increasingly impacts the resource dist ributi on of res ear ch inst itutions [12 ]. Regarding these facts, it is surprising that not much work has been devoted to scientometric analysis of cloud comput- ing research and even more so regarding a comprehensive scientometric study of the eld. At this point, we are aware of three scientometric studies in the context of cloud com- puting. The aut hor s of [14 ] inv esti gat e 510 pub lic ations related to cloud computing that are obtained from the web of sci ence (WoS) dat aba se for the yea rs 2001-2010. They look at the productivity of authors and contributing coun- tries by ana lyz ing the number of pub lic ati ons whi ch is aggregated by WoS. In [15], 89 journal papers related to cloud computing research in China are investigated for the time period between 1993 and 2010. Based on the data of the Chinese Journal Full-text Database (CNKI), the authors examine the distribution of the number of journal papers, authors, subjects and funded papers. In [16], scientometric methods are applied to analyze the research progress of cloud security research from 2008 to 2011 in China. The authors investigate 103 journal articles of 76 journals pro- vided by CNKI. They analyze types of contributing aflia- tions and identify the key topics exclusively focussing on cloud security. In general, these studies lack of important insights, such as given by an overview of current research topics and trends, citation patterns and top publications. Due to the relatively small number of publications being ana lyz ed and the spe cic area under consideration, the implic ations of those studies are limite d. The existing stud- ies especially apply straight count measures to analyze the respec tive literature. Without using specic scient ometri c techniques, for instance different methods to evaluate the produ ctivi ty of autho rs or algor ithms for a keywor d cluster ana lys is, it is difcul t to genera te novel ins ights. Con se- quently, the main objective of this study is to provide a more comprehensive view on the overall cloud computing resea rch ar ea wi thin a relevant time fr ame in or der to  The authors are with the Institute of Infor matio n Systems, Univer sity of  Hamburg, Von-Melle-Park 5, 20146 Hamburg, Germany. E-mail: {leonard.heilig, stefan.voss}@uni-hambur g.de.  Manuscript received 5 Mar. 2014; revised 19 Apr. 2014; accepted 20 Apr. 2014. Date of publication 29 Apr. 2014; date of current version 29 Oct. 2014. Recommended for acceptance by I. Bojanova.  For information on obtaining reprints of this article, please send e-mail to: [email protected] , and reference the Digital Object Identier below. Digital Object Identier no. 10.1109/TCC.2014.232 1168 266 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 2, NO. 3, JULY-SEPTEMBER 2014 2168-7161 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standard s/publications/rights/index.html for more information.

t Cc 2014030266

Embed Size (px)

Citation preview

Page 1: t Cc 2014030266

8/10/2019 t Cc 2014030266

http://slidepdf.com/reader/full/t-cc-2014030266 1/13

A Scientometric Analysis ofCloud Computing Literature

Leonard Heilig and Stefan Voß

Abstract—The popularity and rapid development of cloud computing in recent years has led to a huge amount of publications

containing the achieved knowledge of this area of research. Due to the interdisciplinary nature and high relevance of cloud computing

research, it becomes increasingly difficult or even impossible to understand the overall structure and development of this field without

analytical approaches. While evaluating science has a long tradition in many fields, we identify a lack of a comprehensive scientometric

study in the area of cloud computing. Based on a large bibliographic data base, this study applies scientometric means to empirically

study the evolution and state of cloud computing research with a view from above the clouds. By this, we provide extensive insights into

publication patterns, research impact and research productivity. Furthermore, we explore the interplay of related subtopics by

analyzing keyword clusters. The results of this study provide a better understanding of patterns, trends and other important factors as a

basis for directing research activities, sharing knowledge and collaborating in the area of cloud computing research.

Index Terms—Cloud computing, cloud computing research, scientometric analysis, scientometrics, keyword cluster analysis

Ç

1 INTRODUCTION

ALTHOUGH cloud computing is a relatively young field of research, the great interest in academia and practice has

led to a considerable amount of publications in recent years.The interdisciplinary nature as well as technical and non-technical potentials and challenges of cloud computing (e.g.,discussed in [1], [2], [3], [4]) are some of the main reasons forthe rapid development. Given the significantly increasingnumber of publications, it becomes more and more impor-tant to investigate the current state and evolution of cloudcomputing research. Quantitative studies measuring andanalyzing science activities form a type of research com-monly known as scientometrics [5]. By providing a view on aresearch field from a meta-perspective [6], [7], scientometricstudies facilitate the development and improvement of anacademic discipline [5], [8] serving as a vital basis for defin-ing and debating future research agendas [9]. Assuming thatscientific activities are reflected through scientific publica-tions, scientometric studies apply empirical measures to ana-lyze scientific output of a specific field in order to betterunderstand the dynamics and structure of its development.Thereby, it is possible to explore the body of publicationsextensively, for example, to observe citation patterns, num-

 ber and types of citations, number and structure of authorsand so forth. Going further, a scientometric study gives someindication of research activities in general, such as withrespect to knowledge sharing, research quality, socio-organi-zational structures, influential countries/affiliations/authors, development of key topics, structural change, andeconomic impact of research. For further reading see, e.g.,

[6], [8], [10], [11], [12], [13]. Moreover, scientometrics, as anevaluation tool of science, increasingly impacts the resourcedistribution of research institutions [12].

Regarding these facts, it is surprising that not much workhas been devoted to scientometric analysis of cloud comput-ing research and even more so regarding a comprehensivescientometric study of the field. At this point, we are awareof three scientometric studies in the context of cloud com-puting. The authors of [14] investigate 510 publicationsrelated to cloud computing that are obtained from the webof science (WoS) database for the years 2001-2010. Theylook at the productivity of authors and contributing coun-tries by analyzing the number of publications which isaggregated by WoS. In [15], 89 journal papers related tocloud computing research in China are investigated for thetime period between 1993 and 2010. Based on the data of the Chinese Journal Full-text Database (CNKI), the authorsexamine the distribution of the number of journal papers,authors, subjects and funded papers. In [16], scientometricmethods are applied to analyze the research progress of cloud security research from 2008 to 2011 in China. Theauthors investigate 103 journal articles of 76 journals pro-

vided by CNKI. They analyze types of contributing affilia-tions and identify the key topics exclusively focussing oncloud security. In general, these studies lack of importantinsights, such as given by an overview of current researchtopics and trends, citation patterns and top publications.

Due to the relatively small number of publications beinganalyzed and the specific area under consideration, theimplications of those studies are limited. The existing stud-ies especially apply straight count measures to analyze therespective literature. Without using specific scientometrictechniques, for instance different methods to evaluate theproductivity of authors or algorithms for a keyword cluster

analysis, it is difficult to generate novel insights. Conse-quently, the main objective of this study is to provide amore comprehensive view on the overall cloud computingresearch area within a relevant time frame in order to

  The authors are with the Institute of Information Systems, University of  Hamburg, Von-Melle-Park 5, 20146 Hamburg, Germany.E-mail: {leonard.heilig, stefan.voss}@uni-hamburg.de.

 Manuscript received 5 Mar. 2014; revised 19 Apr. 2014; accepted 20 Apr.

2014. Date of publication 29 Apr. 2014; date of current version 29 Oct. 2014.Recommended for acceptance by I. Bojanova. For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference the Digital Object Identifier below.Digital Object Identifier no. 10.1109/TCC.2014.2321168

266 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 2, NO. 3, JULY-SEPTEMBER 2014

2168-7161 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Page 2: t Cc 2014030266

8/10/2019 t Cc 2014030266

http://slidepdf.com/reader/full/t-cc-2014030266 2/13

present empirical and relevant findings. In this paper, weextend the observation period and apply a variety of meth-ods including quantitative and computational algorithmsto analyze key aspects of cloud computing research.

By considering the number of publications in the area of cloud computing, we observe a significant interest espe-cially from 2008 onwards. Since then, the number of articlesdramatically increased year after year. In Google Scholar,

for instance, the number of search results nearly doubleseach year since 2008. The significant increase of scientificliterature can also be recognized in the Scopus and WoSdatabase. Prior to 2008, Scopus indexed only three publica-tions related to cloud computing, WoS only one publica-tion. At the time of this study (as of January 12, 2014),Scopus covers 15,376 relevant publications while WoS cov-ers 8,262 publications. To the best of our knowledge, pub-lished cloud computing articles in the time period from2011 to 2013 are completely unexplored by means of scientometrics.

In this paper, we present a comprehensive scientometric

study that empirically explores publications related to cloudcomputing covered by Elsevier’s Scopus database from 2008to 2013. In total, we investigate 15,376 publications. To the best of our knowledge, this is the first scientometric studythat assesses such a large number of peer-reviewed publica-tions. Thus, the results of this study stand on a broad empir-ical basis and may encourage scholars to conduct morecomprehensive scientometric studies in other academic dis-ciplines. We provide extensive insights into publishing pat-terns (e.g., contributing countries, distribution of outlets),analyze frequent keywords and keyword clusters to identifywidely discussed topics and their relationships, provide

insights into citation patterns of outlets, publications, affilia-tions and authors, and investigate the research productivityof affiliations and scientists in the area of cloud computing.Thereby, we present novel insights from a meta-perspectivethat may help to better understand the evolution, state andtrends of cloud computing research. Due to limitations of space, this study does not intend to give an overview of cloud computing in general (for further comprehensivereading see, e.g., [17], [18], [19]).

The remainder of this paper is organized as follows.Section 2 briefly describes the methodology and methods being applied in this scientometric study. In Section 3,publication patterns are investigated and further ana-

lyzed to understand the general composition of the fieldfrom different perspectives. The current focus of cloudcomputing research and dependencies between topics areobserved by analyzing top keywords and keyword clus-ters in Section 4. Subsequently, the impact and productiv-ity of cloud computing research is examined in Sections 5and 6. Finally, a conclusion is presented in Section 7.

2 METHODOLOGY

The collection of relevant publications and citations estab-lishes the foundation for a scientometric analysis of a spe-

cific research area [7]. As indicated, this study intends tocover a large part of peer-reviewed cloud computingarticles published in the last six years. By this, we aim toobtain empirical evidence for supporting the metascientific

findings of this scientometric study. In this section we

describe our proceeding regarding data collection, dataprocessing and proof-reading.

2.1 Data Collection and Cleansing

As manual processing of bibliographic data can beextremely cost- and labor-intensive [9], we use Elsevier’sScopus to collect and process structured data of articles. TheScopus database has decisive advantages over other biblio-graphic databases such as Thomson Reuters WoS. First, theamount of cloud computing articles being covered by Sco-pus is almost twice as high as in WoS (see Table 1); thisincludes article-in-press publications of a large number of 

 journals. Second, Scopus provides advanced functionalityto export structured data including citation and bib-liographical information as well as abstracts, keywords, andreferences. A general limitation of using bibliographic data- bases is, however, that novel articles are often not coveredwhich explains the decrease in the number of publications between the years 2012 and 2013.

In order to cover a large part of publications in cloudcomputing, a generic search query ‘cloud computing’ isused in the fields   title,  abstract, and  keywords. The asteriskacts as wildcard character. The search query finds 16,042articles for the observation period 2008-2013 (as of January

12, 2014). Data cleansing actions are carried out to detectand remove inaccurate data records (e.g., authors/title notspecified, double entries, etc.). Finally, we retain 15,376 pub-lications containing 273,477 references and 32,620 uniquekeywords. Only 94.57 percent of those publications have anon-empty bibliography resulting in an average of 18.81references per article. The majority of publications, at anaverage 97.38 percent, is written in English.

2.2 Data Processing

Based on an extensive bibliographic data basis, severalmeta-data attributes (e.g., authors, keywords, document

type, etc.) of the collected publications are utilized to ana-lyze certain aspects. In the following, we briefly describe themethods being applied to investigate research productivity,research impact, and keyword clusters.

2.2.1 Research Productivity 

The review of literature reveals four basic approaches tomeasure research productivity:   straight count,   author posi-tion,   equal credit   and   normalized page size   [9], [20]. Thestraight count method assigns a score of one to each of theco-authors of an article. This approach, however, under-values the productivity of single-author papers and favors

individual co-authors of multi-author papers. The authorposition approach, in contrast, assigns a score based onthe original position of authorship and favors firstauthors. To calculate the scores based on the position of 

TABLE 1Number of Publications Per Year

HEILIG AND VOß: A SCIENTOMETRIC ANALYSIS OF CLOUD COMPUTING LITERATURE 267

Page 3: t Cc 2014030266

8/10/2019 t Cc 2014030266

http://slidepdf.com/reader/full/t-cc-2014030266 3/13

authors, the formula proposed by [21] is used [9].Although the method considers that the first author oftenis the main contributor of an article, this approach iserror-prone to multi-author papers where the names of co-authors are arranged in alphabetical order. The equalcredit method aims to compensate these errors; it calcu-lates a per-author score based on the reciprocal of thenumber of authors so that the score of each co-author isreduced by every additional co-author. The last approach,

normalized page size, is not considered in this study forthe following two reasons. First, it assigns a score to eachof the authors based on the number of pages and authorsper publication and thus favors quantity rather than qual-ity. Second, outlets often limit the number of pages perpublication which would lead to an error-prone analysisof research productivity. In this study, we primarily focuson the equal credit method.

2.2.2 Research Impact 

In order to measure the research impact, indices based onindividual citations are applied. This includes individual

citations of journals, conferences, and authors as well as thenormalized citation impact index (NCII) to consider the lon-gevity of publications [9]. Regarding research productivityand research impact, we also discuss the Matthew effectwhich describes the phenomenon that highly recognizedscientists get most of the credit for contributions that arealso presented by many other, relatively unknown scientists[22]. As the credit given by the scientist’s peers again influ-ences recognition, the effect leads to accumulated advan-tages for those authors. Author and/or publicationvisibility is furthermore influenced by positive networkmembership effects, such as given by influential outlets,

research institutes or research collaborations [23].

2.2.3 Keyword Analysis and Other Relevant Aspects 

To further explore key topics and aspects in cloud comput-ing from a meta-perspective, additional methods are imple-mented based on the given data basis. This includesalgorithms for analyzing keyword clusters as well as forclassifying and aggregating bibliographic data.

2.3 Proofreading

As indicated, one purpose of the scientometric analysis is toreduce the effort of analyzing a great amount of peer-

reviewed papers. Although the indexing of scientific publi-cations is highly standardized, some inconsistencies can bedetected, such as for the name of research affiliations. Toensure the correctness of results, generated outputs are

validated by manual proof-reading activities to identifyinconsistencies. The resulting semi-automatic process guar-antees the data quality and quality of results of this study.

3 ANALYSIS OF PUBLISHING PATTERNS

We begin by analyzing the basic structure of cloud comput-ing research from different perspectives. First of all, the dis-tribution of involved research disciplines is investigated inorder to evaluate major disciplines involved in the develop-ment of the field. Subsequently, the distribution of contrib-uting countries is analyzed for all publications and for well-recognized publications. On the level of publications, wefurther analyze the number of authors, distribution of docu-ment types and the number of publications per outlet.

3.1 Academic Disciplines

To obtain an understanding of the general structure anddevelopment of the cloud computing research area, we startwith a scientometric analysis of academic disciplines (seeTable 2). Based on the chosen publication outlet, which may

 be attributed to more than one subject area, each publicationis categorized by Scopus [24].The figures of the distribution of publications show

some interesting patterns. While the contribution in thesubject area   Computer Science   is nearly constant, the num- ber of contributions in other disciplines slightly indicatesthe effects of the hype surrounding cloud computing in anearly phase of development. In particular for the business-related subject areas, a peak of popularity and the subse-quent effect of  disillusionment  can be identified. Accordingto Gartner’s Hype Cycle for Emerging Technologies  [25], cloudcomputing reached the  peak of inflated expectations  between2008 and 2009 which is reflected by the number of contri-

 butions in certain disciplines. This implies that researchactivities in cloud computing are partially affected byongoing expectations.

The figures further indicate that the majority of contribu-tions in the area of cloud computing is related to  ComputerScience. This demonstrates that a lot of research is primarilyconcerned with the technology itself. This observation isalso made in other publications, such as in [4]. According to[4], it is important to equally consider business-relatedissues associated with cloud computing. The numbers fur-ther reveal a slight trend towards the application of cloudtechnologies to support research and/or business-related

activities, such as in engineering and mathematics, but alsothat general initial thoughts regarding potential applica-tions are underpinned with more theoretical considerations(e.g., regarding mathematical models in cloud pricing). We

TABLE 2Subject Areas (Avg 1 %)

268 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 2, NO. 3, JULY-SEPTEMBER 2014

Page 4: t Cc 2014030266

8/10/2019 t Cc 2014030266

http://slidepdf.com/reader/full/t-cc-2014030266 4/13

expect that the trend towards a more productive use of cloud computing in practice will continue in the future.Social and business-related disciplines as well as the inter-action between them will become increasingly important inorder to understand the implications of cloud computingfrom different perspectives.

3.2 Contributing Countries and Authorship

To obtain a deeper insight into contribution patterns, wefurther investigate the distribution of publications per coun-try and authorship patterns. As shown in Table 3 (R.: Rank),the majority of research on cloud computing is carried out by researchers from China (22.50 percent) and the UnitedStates (19.16 percent).

As these numbers alone do not provide enough insightabout the relevance of contributions per country, we alsogenerate a ranking of contributing countries for publica-tions that are cited by at least 50 other publications. Thenumbers in the right area of Table 3 show that mostlyauthors from the United States (36.52 percent) andAustralia (10.43 percent) have contributed widely recog-nized publications.

Regarding the authorship of publications the averagenumber of authors per publication over the last six years isdepicted in Fig. 1. The distribution of authorship shows that

for more than half of the publications  n  the number of co-authors is between two and four. In conjunction with therelatively high percentage of publications with five or moreauthors, the distribution demonstrates that collaboration

may have some advantages over research by individualresearchers. The interdisciplinary nature of cloud comput-ing research may be one of the main reasons for the domi-nance of joint works.

3.3 ReferencingWe next look at referencing patterns of publications with anon-empty bibliography (n ¼ 14;541, see Table 4). A tablerow describes the average number of references  f   depend-ing on the number of citations a publication receives. Thenumber of considered publications is expressed by   n. Forinstance, a publication which is cited by 100 or more publi-cations contains on average 30.68 references. By comparingthose numbers we find that frequently cited publications(cited by at least 25 publications) contain on average aboutten references (f ) more than other publications. The cover-age of significant literature is recognized as a main criterion

for high quality research [26]. Although these findings standon a broad empirical basis, we have to consider that outletsoften limit the maximum number of pages per publicationwhich also affects the number of references.

3.4 Form of Publication

The selection of an appropriate outlet often has an influenceon the visibility and impact of an article. Consequently, it isinteresting to analyze which type of publication venueresearchers prefer for conveying their ideas and insights tothe research community. As the document type is specifiedfor all observed cloud computing articles, it is possible to

analyze the distribution of document types for a respectiveresearch field. Table 5 indicates that most articles on cloudcomputing, at an average 73.88 percent, are preferablypublished through conference proceedings. This can beexplained by the fact that most of the research activities arecarried out by the computer science research community, asshown in Table 2, where conference publications havealways had a dominant presence and are legitimized as theprimary means of publication [27], [28]. One reason is thatthe review and publishing of journal papers take a longtime whereas conference papers are usually publishedmuch faster. In particular in a rapidly growing field of 

research, as in the case with cloud computing, a timely pre-sentation is important; otherwise, it may happen that anidea gets obsolete or is presented, in a similar form, byother researchers.

TABLE 3Contributing Countries (Left: All publications;Right: Publications Cited at Least 50 Times)

 f f 

Fig. 1. Co-Authorship Distribution (n ¼ 15;376).

TABLE 4Referencing Patterns

HEILIG AND VOß: A SCIENTOMETRIC ANALYSIS OF CLOUD COMPUTING LITERATURE 269

Page 5: t Cc 2014030266

8/10/2019 t Cc 2014030266

http://slidepdf.com/reader/full/t-cc-2014030266 5/13

To extend this analysis, we further investigate the dis-tribution of publications for conferences and journals.The number of publications per conference only provideslimited insights as it is usually dependent on the size of a conference and is limited to a particular year. Thenumber of publications per journal, in contrast, revealscommon journals for publishing articles in the area of cloud computing. Table 6 shows a list of journals andthe related number of publications (f ). The numbersindicate that journal articles are published by a wide

variety of scientific journals emphasizing the various the-oretic roots, such as in distributed computing [1]. Jour-nals with a specific focus on cloud computing, such as

 Journal of Cloud Computing   and   IEEE Transactions onCloud Computing, are not listed in the ranking mainlydue to their novelty.

4 FREQUENT KEYWORDS AND KEYWORD

CLUSTERS

Keywords are an effective tool to abstractly represent andclassify the content of a scientific article. From a meta-per-spective, keywords provide the foundation for analyzing

the key topics and aspects representing a respectiveresearch area. The emergence of new popular topics can bequickly identified by looking at the occurrence of keywordswithin a specific timeframe. By analyzing co-occurrences of keywords, it is also possible to identify topics or aspectsthat are strongly related to each other.

The scientometric study extracts 32,620 unique key-words from the   author keywords   and   index keywords   fieldof the collected source articles. Note that keywords arenot always specified by the authors. In case of missingkeywords, index keywords are manually assigned by

professional indexers based on several thesauri [24]. Bythis, we observe that keywords are specified for 97.18percent of the analyzed publications, while for 15.57 per-cent of those publications, index keywords have beenassigned by Scopus. In case that no author keywords arespecified, we use, if available, the indexed keywords of apublication for the analysis of frequent keywords andkeyword clusters.

Regarding the average distribution of the number of keywords per publication, we observe that commonly

 between three and six keywords are used to capture thecore topic of a publication (see Fig. 2). This observationis not specific for cloud computing; publishers oftenspecify the minimum and/or maximum number of key-words per publication. In order to reduce the variabilityof keyword terms, publishers usually provide a prede-fined list of standardized keywords being relevant for aspecific journal or conference.

The ranking of the top keywords with a high frequency(f   greater than or equal to 100) is provided in Table 8.The results indicate that recent research activities aremainly focussed on the technology itself (e.g.,  virtualiza-tion,   scheduling,   energy efficiency,   load balancing), currentchallenges (e.g.,  security,  privacy,  interoperability,  quality of service,   monitoring) and the utilization of scalable cloudresources (e.g.,   scalability,   optimization,   MapReduce). Theliterature reveals that these topics are also discussed, e.g.with regard to research challenges, in widely recognizedpublications, such as in [2], [3], [4], [17], [19]. This showsthat those publications have a huge impact on the direc-tion of cloud computing research. Popular links to otherfields of research are also revealed, for instance with thekeywords  internet of things  and  grid computing. Again, thedominance of computer science related research in cloudcomputing is evident. By analyzing frequent keywords

per year, the emergence and growing popularity of spe-cific topics can be documented. For example, the growingimportance of cloud computing to efficiently process

TABLE 6Number of Publications per Journal (f   30)

TABLE 5Number of Publications by Document Type

Fig. 2. Number of Keywords per Publication (n ¼ 14;942).

270 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 2, NO. 3, JULY-SEPTEMBER 2014

Page 6: t Cc 2014030266

8/10/2019 t Cc 2014030266

http://slidepdf.com/reader/full/t-cc-2014030266 6/13

large and complex masses of structured and unstructureddata is depicted by keywords shown in Table 7. Based onfoundational research concerned with methods (e.g., datamining techniques, MapReduce algorithm) and technolo-gies (e.g., Hadoop) we recognize the emergence of newresearch topics in the field of cloud computing, such asBig Data  and the  Internet of Things. Consequently, the key-word analysis can be used as a tool for identifying currentresearch trends.

As an area of interest is usually characterized by morethan one keyword, we further analyze the co-occurrences

of keywords within the keyword list of publications, alsoreferred to as keyword cluster. Frequent keyword clus-ters unveil interconnections between different aspectsand topics. While the analysis of top keywords isstraight forward, the analysis of keyword clusters iscomputationally complex since every possible combina-tion of relevant keywords needs to be compared. Theimplemented algorithm incrementally increases thelength of keyword clusters in order to identify all rele-vant co-occurrences. In the following, we perform a key-word cluster analysis for keyword clusters with twoelements (Table 9) and three elements (Table 10). To

obtain meaningful keyword clusters, we removed the

keyword   cloud computing   since it mainly demonstratesthe relation to cloud computing research (used in 60.84percent of all observed publications). The findings reveala strong interrelation between certain keywords. For thetwo element keyword clusters, for instance, we see thatthe terms   Hadoop   and  MapReduce   as well as   privacy   andsecurity   are often used together. For three element key-word clusters we obtain, for instance, that the combina-

tion of the terms   IaaS,   PaaS   and   SaaS   is common. Bystudying the generated keyword clusters, it is possible toidentify dependencies between certain keywords. Forexample, the term   cloud service   is related to   web servicesthat rely on   distributed database systems   or that  virtualiza-tion   has an influence on the  security  of cloud computing.Again, the technical focus of research is obvious. Conse-quently, our findings help to better understand the rele-vant topics in cloud computing research and how theyare related to each other. Furthermore, the keyword clus-ter analysis provides the foundation for generating atopic network consisting of nodes (topics) and edges

(relationship between topics).

5 CITATION PATTERNS

While the numbers given in the previous sections pro-vide insights into publishing patterns and key disciplinesof cloud computing, a primary concern of a scientometricstudy is to evaluate the impact of contributions. A mea-sure for analyzing the impact of contributions is theaggregated number of citations a publication receives.The aggregated citations of a publication express howoften this publication is referenced in other publications.For the overall number of publications we receive 33,788

citations. The number of citations varies between 0 and

TABLE 8Top Keywords (f   100)

TABLE 7Yearly Occurrence of Keywords

HEILIG AND VOß: A SCIENTOMETRIC ANALYSIS OF CLOUD COMPUTING LITERATURE 271

Page 7: t Cc 2014030266

8/10/2019 t Cc 2014030266

http://slidepdf.com/reader/full/t-cc-2014030266 7/13

1,002 with an average of 2.197 references per article. Inthe following, we evaluate the impact of contributionsfrom different perspectives.

5.1 Overall Citation Patterns

First, we analyze the distribution and impact of citations ingeneral. As time has a significant influence on the numberof citations a publication receives, we use the NCII in orderto make the citation numbers of several years comparable.The NCII takes into account the longevity of a publicationwhich refers to the number of years the publication has been in print [9]

NCII  ¼number of citations per publication

publication longevity (in years)  :   (1)

The figures presented in Table 11 reveal some interestingpatterns. The most noticeable observation is that the averageNCII per publication is declining which demonstrates thesignificant impact of early and fundamental publications on

other publications. We also recognize a significant declineof citations from 2011 to 2012. The main reason for this isthat many references used in publications of 2013 are stillnot covered by Scopus (see Table 1).

5.2 Outlet CitationsAs a next step we analyze the distribution of citations withrespect to different publication outlets. The results shown inTable 12 outline that the main sources of references are con-ference papers (49.75 percent) and journal papers (41.69 per-cent). Regarding the number of contributions per outlet, asanalyzed in Section 3.4, we observe that journal contribu-tions are more efficient with respect to received citations. Inaddition, the figures indicate that books and/or book chap-ters currently play a minor role in the area of cloud comput-ing. This is contradictory to other fields [12] where booksand book chapters are cited more frequently (see, e.g., scien-

tometric studies in the journal   Scientometrics   [6], [10]).Although scientific review articles play a minor role, theyreceive a large part of the remaining citations. This can beexplained by the simple fact that a review article can be

TABLE 9Top Keyword Clusters of Length 2 (f   35)

 f 

TABLE 10Top Keyword Clusters of Length 3 (f   15)

 f 

TABLE 11General Citation Patterns

272 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 2, NO. 3, JULY-SEPTEMBER 2014

Page 8: t Cc 2014030266

8/10/2019 t Cc 2014030266

http://slidepdf.com/reader/full/t-cc-2014030266 8/13

seen as a valuable measure to handle the rapidly growing body of knowledge in cloud computing.

5.3 Conference and Journal Citation Patterns

The figures of the previous section have shown that most of the citations are received through conference and journal

papers. To further investigate how the citations are distrib-uted among different conferences and journals, we generate both a ranking for conferences (see Table 13) and journals(see Table 14). Again, we apply a straight count method to

analyze the citation patterns in order to clearly differentiate between research impact and productivity. By consideringthe conference citations, we observe that widely cited publi-cations are mainly published by cloud computing-specificsymposia. We also see that influential publications are pub-lished by three main conferences:   IEEE International Confer-

ence on Cloud Computing (Google Scholar h5-index: 30), IEEE International Conference on Cloud Computing Technology andScience  (Google Scholar h5-index: 29) and IEEE/ACM Inter-national Symposium on Cluster, Cloud and Grid Computing.

TABLE 12Number of Citations per Outlet (n ¼ 15;376)

TABLE 13Conference Citations (f   120)

TABLE 14Journal Citations (f   100)

 f n f 

HEILIG AND VOß: A SCIENTOMETRIC ANALYSIS OF CLOUD COMPUTING LITERATURE 273

Page 9: t Cc 2014030266

8/10/2019 t Cc 2014030266

http://slidepdf.com/reader/full/t-cc-2014030266 9/13

The reason is mainly that these venues feature high qualityand high impact research and serve a large academic audi-ence. Our results correspond to the Google Scholar h5-met-rics (http://scholar.google.de, date: 01/12/14). By addingthe journal impact factor (IF) from the 2012 journal citationreports (JCR), which calculates the average number of cita-tions per publications based on the two preceding years, weobserve that the average number of citations per journalarticle and the IF are only slightly correlated (r ¼ 0:47). An

even smaller correlation is observed by using the impactfactor based on the last 5 years (r ¼ 0:41). This demonstratesthat the impact of journal articles published within theemerging cloud computing research area is not greatly

influenced by the IF of journals. It should be noted that theIF, aiming to reflect the importance of journals, is not with-out any controversial discussion. A problem in the emerg-ing area of cloud computing research could be, for instance,that relatively new, but influential journals are not indexedin the JCRs.

5.4 Publication Citation Patterns

Next we focus on the top publications in cloud computing.

For this purpose, we calculated the total count of citationsas well as the NCII for each publication (note that not allarticles in the ranking are referenced in the bibliography),as depicted in Table 15. To compare the results with another

TABLE 15Top Cited Publications (NCII Score 30.0)

274 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 2, NO. 3, JULY-SEPTEMBER 2014

Page 10: t Cc 2014030266

8/10/2019 t Cc 2014030266

http://slidepdf.com/reader/full/t-cc-2014030266 10/13

citation measure, a column  f G is added containing the cita-

tion count of Google Scholar (http://scholar.google.de,date: 01/12/14).

5.5 Author and Affiliation Citation Patterns

In order to get an overview of the most influential authors,we aggregate the NCII for each author. Each co-authorreceives the full count of citations of a corresponding article.A list of top authors is given in Table 16. The number of con-sidered publications that are cited by at least one other publi-cation (n) and the overall frequency of citations ( f ) are added.We see that   Rajkumar Buyya  of the  University of Melbourne,co-author of two top publications (see Table 15), is currently

the most influential author in the area of cloud computing interms of citations. This observation was already made forthe time period from 2001-2010 in [14]. Regarding the num- ber of publications, we observe that, for many authors, onlyone publication is relevant for the citation index. This under-lines the large impact of a handful of publications in the field;therefore, the ranking of top authors is strongly related to theranking of top publications. Furthermore, the figures indi-cate that most of the top authors are currently from theUnited States and Australia, which corresponds to the scien-tometric analysis of top contributing countries, depicted inTable 3. The individual impact of authors significantly influ-

ences the impact of research affiliations (see Table 17). Notethat   Manjrasoft, a research spin-off founded by   RajkumarBuyya, strongly collaborates with the University of Melbourneexplaining its high impact.

6 RESEARCH PRODUCTIVITY

The evaluation of research productivity is an importantmeasure to identify the most active research institutes andscholars in the field. The insights may help, for instance, to build fruitful research collaborations and reflect the globaldistribution of research.

First we analyze the institutional research productiv-ity. Table 18 shows a ranking of research institutesordered by the number of contributed publications (f ).The numbers demonstrate the dominance of publicationscoming from Chinese institutions (as already demon-strated in Table 3). Most of these institutions have anexcellent overall reputation (e.g.,  University of Melbourne,

Peking University,  Tsinghua University) and dispose of suf-ficient monetary and personell resources. In sum this pro-vides the basis for promoting new generations of highlyqualified scientists and enables to employ several scien-tists working on particular topics, such as in the area of cloud computing. The concentration of cloud computingresearch attracts new generations of scientists as theymay benefit from a broad range of knowledge and exper-tise. This effect is consistent with the Matthew effect asresearchers benefit from the intrinsic recognition of theiraffiliation and the associated network benefits.

Finally, we perform a scientometric analysis to evaluate

the productivity of authors in the area of cloud computing.As discussed in Section 2, several methods for measuringthe productivity of individual authors exist. In this paper,we focus on the equal credit method as it seems to be the

TABLE 16Top Cited Authors (NCII Score 60.0)

HEILIG AND VOß: A SCIENTOMETRIC ANALYSIS OF CLOUD COMPUTING LITERATURE 275

Page 11: t Cc 2014030266

8/10/2019 t Cc 2014030266

http://slidepdf.com/reader/full/t-cc-2014030266 11/13

 best compromise among the discussed methods. Based onthe equal credit method, each individual author receives ascore based on the reciprocal of the number of authors perarticle, depicted in Table 19. We observe that most of thetop productive individual contributors are from China (7),

United States (6), Austria (5) and Australia (4). By compar-ing the results with the results of the author positionmethod and straight count method, we observe that  Rajku-mar Buyya   is not only the most influential researcher, butalso the most productive one in the area of cloud comput-ing. This demonstrates that a high impact of authors canhave a positive effect on the individual productivity as itattracts interest from other researchers to collaborate inorder to benefit from the author’s recognition. Combinedwith the top cited publications, the numbers indicate theexistence of the Matthew effect in the area of cloud comput-ing. Authors who gained high recognition in an early stage

of research development by contributing ideas and discov-eries through using appropriate outlets are repeatedlyrewarded by other scientists.

7 CONCLUSIONS

Cloud computing attracts a lot of interdisciplinary attentionand is a rapidly developing field of research. In this paper,

we conduct a scientometric analysis to comprehensivelyinvestigate the development and current state of cloud com-puting related publications based on a large bibliographicdata basis provided by Scopus.

The results of this study reveal that past and currentresearch is dominated by computer science research con-veyed especially through conference proceedings. The focusof research activities is predominantly influenced by funda-mental and highly recognized scientists and publications. Inthis regard, we demonstrate the Matthew effect in the area of cloud computing. Given the results of the keyword analysisit is obvious that the past and current focus of cloud comput-ing research lies mainly on the technology itself rather thanon socioeconomic issues. Current trends, such as depicted by keywords related to data analysis and   Big Data,

TABLE 17Top 15 Cited Affiliations

TABLE 18Top 15 Contributing Affiliations

TABLE 19Individual Productivity (Equal Credit Method, Score 5.5)

276 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 2, NO. 3, JULY-SEPTEMBER 2014

Page 12: t Cc 2014030266

8/10/2019 t Cc 2014030266

http://slidepdf.com/reader/full/t-cc-2014030266 12/13

demonstrate the increasing importance of shifting the focusof research to socioeconomic issues to solve understudiedproblems and better utilize the potentials of cloud comput-ing. This may help to increase the overall value of cloud com-puting and may facilitate further adoption in both anacademic and practical context. Thus, the results of the scien-tometric analysis may help the relatively new field of cloudcomputing to (re-) define itself in order to provide a clear

direction and objectives for research. The analysis of maincontributing affiliations and highly influential authors andpublications may help, especially new generations of schol-ars, to get an overview of important publications, topics, out-lets and to identify main contributors and driving forces inthe area of cloud computing. Thus, the results of this studycan be used to better understand patterns, trends and otherimportant factors for directing individual research activities,efficiently extending research networks and selecting appro-priate publication outlets for sharing individual knowledge.

The empirical findings of the scientometric analysis arepartially reflected in widely recognized publications (e.g.,

relevant topics, research challenges). In general, this demon-strates the strength of a scientometric analysis to extensivelyinvestigate a field of interest. As demonstrated, the resultsof the scientometric study are not only valuable for discus-sing and defining future research agendas in the area of cloud computing. Moreover, the semi-automated process of assessing a large amount of publications makes it possibleto easily obtain a general overview of a particular researcharea. This, in contrast, is not possible with structured litera-ture reviews. Therefore, the study represents a good startingpoint for academics and practitioners to identify the sourcesand concentration of the existing knowledge base. In addi-

tion, research trends and important topics can be observedover a specific period of time by means of keyword analysis.For further research, we intend to investigate and visual-

ize collaboration structures among authors as well as therelationship between topics and authors in order to betterunderstand the dynamics and network structure of thisfield. In this regard, we aim to analyze the patterns of trendsfor specific topics. In addition, we plan to compare theresults with other metrics in order to further evaluate ourresults. Technically, we indend to further improve theapplied data processing algorithms in order to furtherreduce manual proof-reading activities by means of datamining and machine learning.

REFERENCES

[1] M. A. Vouk, “Cloud computing-issues, research andimplementations,”   J. Comput. Inform. Technol., vol. 16, no. 4,pp. 235–246, 2004.

[2] Q. Zhang, L. Cheng, and R. Boutaba, “Cloud computing: State-of-the-art and research challenges,”   J. Internet Serv. Appl., vol. 1,no. 1, pp. 7–18, 2010.

[3] T. Dillon, C. Wu, and E. Chang, “Cloud computing: Issues andchallenges,” in   Proc. 24th Int. Conf. Adv. Inform. Netw., 2010,pp. 27–33.

[4] S. Marston, Z. Li, S. Bandyopadhyay, J. Zhang, and A. Ghalsasi,“Cloud computing-the business perspective,”   Decision SupportSyst., vol. 51, no. 1, pp. 176–189, 2011.

[5] B. R. Lewis, G. F. Templeton, and X. Luo, “A scientometric investi-gation into the validity of IS journal quality measures,”   J. Assoc. Inform. Syst., vol. 8, no. 12, pp. 619–633, 2007.

[6] A. van Raan, “Scientometrics: State-of-the-art,”   Scientometrics,vol. 38, no. 1, pp. 208–218, 1996.

[7] S. Schwarze, S. Voß, G. Zhou, and G. Zhou, “Scientometric analy-sis of container terminals and ports literature and interaction withpublications on distribution networks,” in  Proc. 3rd Int. Conf. Com-put. Logistics, 2012, pp. 33–52.

[8] D. Straub, “The value of scientometric studies: An introduction toa debate on IS as a reference discipline,”   J. Assoc. Inform. Syst.,vol. 7, no. 5, pp. 241–246, 2006.

[9] A. Serenko and N. Bontis, “Meta-review of knowledge manage-ment and intellectual capital literature: Citation impact andresearch productivity rankings,”   Knowl. Process Manage., vol. 11,

no. 3, pp. 185–198, 2004.[10] W. Hood and C. Wilson, “The literature of bibliometrics, sciento-metrics, and informetrics,”  Scientometrics, vol. 52, no. 2, pp. 291–314, 2001.

[11] L. Leydesdorff, “Indicators of structural change in the dynamicsof science: Entropy statistics of the SCI journal citation reports,”Scientometrics, vol. 53, no. 1, pp. 131–159, 2002.

[12] S. Voß and X. Zhao, “Some steps towards a scientometric analysisof publications in machine translation,” in  Proc. IASTED Int. Conf. Artif. Intell. Appl., 2005, pp. 651–655.

[13] L. Leydesdorff and T. Schank, “Dynamic animations of journalmaps: Indicators of structural changes and interdisciplinarydevelopments,”  J. Am. Soc. Inform. Sci. Technol., vol. 59, no. 11,pp. 1810–1818, 2008.

[14] K. Sivakumaren, S. Swaminathan, and G. Karthikeyan, “Growthand development of publications on cloud computing: A sciento-

metricstudy,” Int. J. Inform. Library Soc., vol.1, no. 1,pp. 37–43,2012.[15] Q. Bai and W.-h. Dong, “Scientometric analysis on the papers of 

cloud computing,”  Sci-Tech Inform. Develop. Econ., vol. 5, no. 1,pp. 6–8, 2011.

[16] T. Wang and G. Huang, “Research progress of cloud security from2008 to 2011 in China,” Inform. Sci., no. 1, pp. 153–160, 2013.

[17] I. Foster, Y. Zhao, I. Raicu, and S. Lu, “Cloud computing and gridcomputing 360-degree compared,” in  Proc. Grid Comput. Environ.Workshop, 2008, pp. 1–10.

[18] R. Buyya, C. S. Yeo, S. Venugopal, J. Broberg, and I. Brandic,“Cloud computing and emerging IT platforms: Vision, hype, andreality for delivering computing as the 5th utility,”  Future Gener.Comput. Syst., vol. 25, no. 6, pp. 599–616, 2009.

[19] M.Armbrust, A. Fox, R.Griffith, A. D. Joseph, R. Katz, A. Konwinski,G. Lee, D. Patterson, A. Rabkin, and I. Stoica, “A view of cloudcomputing,” Commun. ACM, vol. 53, no. 4, pp. 50–58, 2010.

[20] C. W. Holsapple, L. E. Johnson, H. Manakyan, and J. Tanner,“Business computing research journals: A normalized citationanalysis,” J. Manage. Inform. Syst., vol. 11, no. 1, pp. 131–140, 1994.

[21] G. S. Howard, D. A. Cole, and S. E. Maxwell, “Research produc-tivity in psychology based on publication in the journals of theAmerican psychological association.,”   Amer. Psychol., vol. 42,no. 11, pp. 975–986, 1987.

[22] R. K. Merton, “The Matthew effect in science,” Science, vol. 159,no. 3810, pp. 56–63, 1968.

[23] J. R. Faria and R. K. Goel, “Returns to networking in academia,”Netnomics, vol. 11, no. 2, pp. 103–117, 2010.

[24] Scopus. (2012). Content coverage guide [Online]. Available:http://www.info.sciverse.com/UserFiles/sciverse_scopus_con-tent_coverage_0.pdf 

[25] Gartner. (2014). Hype cycles [Online]. Available: http://www.

gartner.com/technology/research/methodologies/hype-cycle. jsp[26] D. W. Straub, S. Ang, and R. Evaristo, “Normative standards for IS

research,” SIGMIS Database, vol. 25, no. 1, pp. 21–34, 1994.[27] M.Y. Vardi, “Conferences vs. journals in computing research,”

Commun. ACM, vol. 52, no. 5, p. 5, 2009.[28] Computing Research Association (CRA), Washington, DC, USA,

“Best practices memo evaluating computer scientists and engi-neers for promotion and tenure,” Comput. Res. News, 1999.

HEILIG AND VOß: A SCIENTOMETRIC ANALYSIS OF CLOUD COMPUTING LITERATURE 277

Page 13: t Cc 2014030266

8/10/2019 t Cc 2014030266

http://slidepdf.com/reader/full/t-cc-2014030266 13/13

Leonard Heilig   received the B.Sc. degree inInformation Systems from the University ofM€unster, Germany, and the M.Sc. degree inInformation Systems from the University ofHamburg, Germany. He is currently at theInstitute of Information Systems, University ofHamburg. He spent some time at the Universityof St Andrews, Scotland, United Kingdom, focus-ing on security management, web technologiesand software engineering. Practical experiences

include work at companies like Adobe Systems,Airbus Group Innovations and Beiersdorf Shared Services. His currentinterest focuses on cloud computing. Related applications incorporatemobile workforce management systems.

Stefan Voß   received the diploma in mathemat-ics and economics from the University of Ham-burg, the PhD degree and the habilitation fromthe University of Technology Darmstadt. He iscurrently professor and director of the Instituteof Information Systems, University of Hamburg.Previous positions include full professorand head of the department of BusinessAdministration, Information Systems and Infor-mation Management, University of Technology,

Braunschweig, Germany, from 1995 up to 2002.His current research interests include quantitative/information systemsapproaches to supply chain management and logistics including appli-cations in maritime shipping, public mass transit and telecommunica-tions. He is an author and co-author of several books and numerouspapers in various journals. He is on the editorial board of some journalsincluding being editor of   Netnomics   and  Public Transport . He is fre-quently organizing workshops and conferences. Furthermore, he isconsulting with several companies.

"   For more information on this or any other computing topic,please visit our Digital Library at www.computer.org/publications/dlib.

278 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 2, NO. 3, JULY-SEPTEMBER 2014