Upload
sumitbanshal
View
135
Download
0
Embed Size (px)
Citation preview
SCIENTOMETRIC ANALYSIS OF RESEARCH
COMPETITIVENESS OF COUNTRIES,
INSTITUTIONS AND SUBJECTS
Supervised By
Dr Vivek Kumar Singh
Assistant Professor
Department of Computer Science
South Asian University
Presented By :
Khushboo Singhal Sumit Kumar Banshal
Roll No. SAU/CS(M)/2013/005 Roll No. SAU/CS(M)/2013/018
Department of Computer Science Department of Computer Science
South Asian University South Asian University
5/17/2015
Outline
Introduction
Questions we Aimed to Answer
Country/Region Level Analysis
Institution Level Analysis
Fine Grained Research Theme based Analysis
Scientometric & Indicators
Derived Indicators
Bibliographic Databases
Our Work
Regional Analysis
Institution Level Analysis
Fine Grained Research Theme based Analysis
Challenges
Publication Out of this Work
Selected Bibliography
Introduction
Scientometric Assessment of Research Competitiveness is
distributed in three different aspects:
Country/Region Level Analysis
South Asia
Bangladesh
India
Institution Level Analysis
Top 100 world institutes
Central Universities (CU)
Indian Institute of Technology (IIT)
Fine Grained Research Theme based Analysis
Big Data
Questions we Aimed to Answer
Can IT infrastructure be mapped with CS research output from South Asian countries?
Can we analyze the CS research output stand of Bangladesh?
Can we visualize the CS research output stand of India?
Can we characterize the leading World Institutes ?
Can we map the proportionate contribution of CU in India and rank CU accordingly?
Can we rank IIT based on research output & characterize the research ?
Can this methodology be characterized in narrow research theme?
Country/Region Level Analysis
South Asia (SA)
Mapping IT infrastructure with CS Research Output
Bibliographic data from Web of Science for SA
Countries
Afghanistan, Bangladesh, Bhutan, India, Maldives, Nepal,
Pakistan, Sri Lanka
For the period 1989-2013
Standings of SA Countries in IT
Total 15,841 records (15,810 unique)
Country/Region Level Analysis contd…
Bangladesh
Insight look on Country’s Research Output
Trends, Author Ship Patterns, Top Contributors
Bibliographic data from Scopus
For the period 1989-2013
Total 3200 records (3193 unique)
Country/Region Level Analysis contd…
India Insight look on Country’s Research Output
Trends, Author Ship Patterns, Top Contributors
Bibliographic data from Scopus
For the period 1989-2013
Total 84385 records
100 institutions
61502 records (72% of Total Data)
59682 unique records
Institution Level Analysis
Top 100 CS Research Producing Institutes of the
World (W-100)
Measuring Research Competitiveness of W-100
Characterizing Research Trends
Implementing Composite Rank
Bibliographic data from Web of Science
For the period 1999-2013
Total 261,154 records
251,312 unique records
Institution Level Analysis contd…
Central Universities in India (CU)
39 Central Universities (http://mhrd.gov.in/)
Measuring Contribution to Indian Research
Rank Institute based on Research Strengths
Identifying Trends & Themes in Research
For the period 1990-2014
Total 64302 records
63776 unique records
Each record comprises of 60 attributes
Institution Level Analysis contd…
Indian Institutes of Technology (IIT)
16 IIT (https://www.iitsystem.ac.in/IITCouncil.jsp)
Measuring Contribution to Indian Research
Rank IIT based on Research Strengths
Identifying Trends & Themes in IIT Research
For the period 1990-2014
Total 81588 records
80991 unique records
Each record comprises of 60 attributes
Big Data
Characterizing Research Output from Narrow Discipline
Fine-Grained Research Theme Mapped into Scientometric
Methodology
Emerging Topic since 2005s
Collected Data from Scopus & WOS
For the Period 2010-2014
Total Records
WOS:- 1415 (60 Fields)
Scopus:- 6810 (41 Fields)
Fine Grained Research Theme based Analysis
Scientometric & Indicators
Composition of science and metrics
Study of measuring and analyzing science, technology and innovation
Measure scientific research and impact of the research in scientific communities
Research include qualitative and quantitative approaches
Direct Indicators Derived Indicators
Total Publications Co-authorship Highly Cited Papers (HiCP)
No. of Words No. Of References Average Citation Per Paper (ACPP)
Citation Counts Internationally Collaborated papers (ICP)
H-index G-index Hg-index P-index
Derived Indicators
Highly Cited Papers (HiCP)
HiCP indicator refers to those papers that are among the 10% most cited papers worldwide in a particular year. For this, first find the citation threshold for the top 10% cited papers worldwide in a domain. Obtain the number of HiCP papers for each institute for each year by
here, y: year, p: paper, TPy : total number of papers in the year, Cy,p : number of citations for a paper in the year and Ɵy :citation threshold for HiCP for the year
More HiCP papers indicate that research output with high impact.
Average Citation Per Paper (ACPP)
ACPP is the ratio of Total Publication (TP) to Total Citation (TC) formulated as,
where, Cn is the number of citations for a given paper n. TP is the total number of such publications.
Internationally Collaborated Papers (ICP)
Internationally collaborated paper refers to those papers having at least two authors who are from two different countries. There may be more authors in the author group but at least one author must be from different country to those of others.
Derived Indicators
H-index
The H-index (Hirsch, 2005) is an index that aims to measure both the productivity and citation impact of the published work. The index is based on the set of the scientist's most cited papers and the number of citations that they have received in other publications.
A scientist has index h if h of his/her Np papers have at least h citations each, and the other (Np − h) papers have no more than h citations each.
G-index
The G-index is an index based on publication records for quantifying scientific productivity. G-index (Egghe, 2006) is calculated based on the distribution of citations received by a given researcher's publications:
Given a set of articles ranked in decreasing order of the number of citations that they received, the g-index is the (unique) largest number such that the top g articles received (together) at least g2 citations
HG-index
HG-index is composite of H-index and G-index. To overcome the disadvantages of both indices, HG-index was introduced. The HG-index (Alonso et al., 2010) is computed as:
where, H and G are H-index and G-index.
Derived Indicators
P-index
P-index is well known for giving best balance between the quantity and quality. P-index
(Prathap, 2010) is computed as:
Here, P is total number of papers and C is total citations.
Bibliographic Databases
There are many well known databases:
Scopus Web of Science MEDLINE
Google Scholar Info Track Biomedical Databases
Compendex GENESIS OAIster
Inspec
BASE
IEEE Xplore
PASCAL
TreeBASE
POPLINE
Trove DOGE Embase
ACM Portal DBLP PubMed
Selected Databases
WOS
Depth of Coverage (90 million records of 250+ disciplines)
12,000 journals proceedings
160,000 conference proceedings
Specific Criteria to Select Journal
Indexing Service
Attributes in tag format (all tags)
Sample Data
Scopus
50 million records
Easy to navigate
Widely Acclaimed Indexing Service as well as publishing house
Sample data
Institution Level Analysis- W100
Measuring Research Competitiveness
Identify Thematic Trends
Rank Institution Based on Composite Indicators
Rank Institution Based on Thematic Strength
Based on Research Strength & Trends
Based on both Qualitative & Quantitative Indicators
One part based on Scientometrics
Other part merged Text with Scientometrics
Ranking : Indicator Values
Rank15 for top 10 institutions (indicator values)
Institution TP TC HiCP ACPP ICP H index
MIT 4385 123671 694 28.203 1470 141
UCB 3616 121682 591 33.651 1138 136
SU 3633 94013 663 25.878 1121 131
IBM 5854 91086 494 15.56 1756 127
INRIA 5432 65934 471 12.138 2451 100
UL 4803 65792 518 13.698 2254 98
CMU 4065 73084 441 17.979 1222 110
MS 4117 67578 410 16.414 1599 101
UIUC 3347 71827 420 21.46 1061 106
HU 2479 62082 445 25.043 923 103
Institution Level Analysis- W100 contd…
Normalized Score
Measuring Relative Performance
Range : 0 to 100
Here, : maximum raw value among all the institutions for the indicator, i
Composite Score of All Indicators
Simple Average
Ranking Computed In Three Blocks
15 years (Rank15) : Whole Period i.e. , 1999-2013
10 years (Rank 10 ) : 2004-2013
5 years (Rank5) : 2009-2013
Institution Level Analysis- W100 contd…
Rank15 for top 10 institutions (normalized values and
rank)
Institution TP Score HiCP
Score
ACPP
Score
ICP
Score
H-Index
Score
Avg.
Score
Rank15
MIT 40.2 72.7 83.8 34.4 100 66.22 1
UCB 33.1 61.9 100 26.6 96.5 63.62 2
SU 33.3 69.5 76.9 26.2 92.9 59.76 3
IBM 53.7 51.8 46.2 41.1 90.1 56.58 4
INRIA 49.8 49.4 36.1 57.4 70.9 52.72 5
UL 44 54.3 40.7 52.7 69.5 52.24 6
CMU 37.3 46.2 53.4 28.6 78 48.7 7
MS 37.7 43 48.8 37.4 71.6 47.7 8
UIUC 30.7 44 63.8 24.8 75.2 47.7 9
HU 22.7 46.6 74.4 21.6 73 47.66 10
Institution Level Analysis- W100 contd…
Impact of Indicator on Ranks
Correlation between Rank15 & Individual Indicators i.e., TP, ACCP and So On.
Impact of One Indicator on Other Indicator
Correlation between TP & ACPP, HiCP, H-Index, ICP and vice versa.
Correlation between Ranks
Spearman Rank Correlation
Here,
K :the size of the ranked sets;
s1,j
and s2,j
: Rank positions of institutions in
the two ranking R1 and R2.
R1 as the computed rank
R2 as indicator-based rank
Institution Level Analysis- W100 contd…
Spearman Rank Correlation between Rank15 and
individual indicators
Institution Level Analysis- W100 contd…
Spearman Rank Correlation between five indicator-
ranks for 100 institutions
Institution Level Analysis- W100 contd…
Identifying Themes of Research
Rank based on Themes
One Institute may be Better in one Specific Area, not for all.
11 Broader Themes in CS Research
Gives a Fine Grained Ranking
Institution Level Analysis- W100 contd…
Flow Diagram of Text Classification
Acronym Full Name
AI Artificial Intelligence
CT Computation Theory
CHA Computer Hardware & Architecture
CN Computer Networks
CSA Computer Software & Applications
CG Cryptography
DBMS Database Management System
IM Internet & Multimedia
OS Operating System
SIP Signal & Image Processing
SE Software Engineering
Thematic Areas with Full Name
Institution Level Analysis- W100 contd…
Thematic research area map
Research strengths of top 10 institutions
Institution Level Analysis- W100 contd…
Thematic area wise composite Rank15
Institution Rank15
AI CT CHA CN CSA CG DBMS
IM
OS
SIP
SE
MIT 1 15 5 23 26 14 17 6 13 19 25 9
UCB 2 4 16 9 4 4 18 18 25 14 5 3
SU 3 33 14 21 12 16 35 21 10 42 31 19
IBM 4 29 83 4 24 25 14 13 19 9 34 14
INRIA 5 9 6 1 1 5 1 5 4 2 9 2
UL 6 12 7 36 11 9 8 7 12 16 4 6
CMU 7 25 12 13 19 10 20 28 22 15 35 16
MS 8 6 78 11 18 28 21 9 5 21 8 15
UIUC 9 21 52 22 28 19 28 27 26 22 7 26
HU 10 11 61 35 15 17 42 4 7 58 29 5
Identifying Trends in Research
Measuring Contribution to Indian Research
Identifying Authorship Patterns
Institution Level Analysis- CU
39 CU on a Geographical Map Proportionate share of 39 CU to total Research
Output
Institutional Level Analysis- CU contd…
Distribution of Research output among 39 CU
1990-2014 2010-2014
Composite Rank of CU in India 2010-2014
Institutional Level Analysis- CU contd…
All Rank Results
H-Index of Top CU in India Exergy Curve for Selected CU of India
Institutional Level Analysis- CU contd…
Exergy= Pi2 = P* (C/P) 2 = C2/P
Institutional Level Analysis- CU contd…
Discipline-wise Research Output Positions
1990-2014
Discipline-wise Research Output
Institutional Level Analysis- IIT
Rank Institute based on Research Strengths
Identifying Trends in Research
Measuring Contribution to Indian Research
Identifying Authorship Patterns
Identifying Thematic Research Strength
16 IIT on a Geographical Map
Proportionate share of 16 IIT to total Research Output
Institutional Level Analysis- IIT contd…
Total Research Output of 16 IIT
Cited Percentage of Research Output of 16 IIT and India
IITKGP- most prominent over
the years followed by IITM, IITB
& IITD
Citedness (Cited %) of IIT
papers is quite higher than
Indian total research
Institutional Level Analysis- IIT contd…
Distribution of Research output among 16 IIT
1990-2014 2010-2014
Institutional Level Analysis- IIT contd…
Composite Rank of IIT 1990-2014
All Rank Results
Institutional Level Analysis- IIT contd…
Discipline-wise Research Output Positions Discipline-wise Research Output
Fine Grained Research Theme Level
Analysis- Big Data
Research Output, Relative Growth Rate (RGR) and Doubling Time (DT)
Characterizing Research Output from Narrow Discipline
Fine-Grained Research Theme Mapped into Scientometric Methodology
Mapping Research Theme in Scientometric Indicators & Metrics
Research Growth, Trends, Themes etc Plotted
Fine Grained Research Theme Level
Analysis- Big Data contd…
Institution-wise Research Output with Scientometric indicators
Fine Grained Research Theme Level
Analysis- Big Data contd…
Most Productive Authors (WOS data)
Author Cliques for Author Chen JJ
6 Authors from top 25 authors group size of 32
Fine Grained Research Theme Level
Analysis- Big Data contd…
Discipline-wise Distribution of Research Output (WOS data)
Fine Grained Research Theme Level
Analysis- Big Data contd…
Controlled Term Based Theme Density Plot (WOS Data)
Challenges
No Standard Datasets
Semi Structured Data
Regular Updates in Databases
High Subscription Rate of Indexing Services
Switching Affiliations
Affiliations not in Identical Format
Data Format Varies in Databases
Publications Out of this Work
Published:
Singhal, K., Banshal, S. K., Uddin, A., & Singh, V. K. (2014). The information technology knowledge infrastructure and research in South Asia. Journal of Scientometric Research, 3(3), 134. http://www.jscires.org/text.asp?2014/3/3/134/153578
Banshal, S. K., Singhal, K., Uddin, A., & Singh, V. K. (2014). Mapping Computer Science research in Bangladesh. In Proceedings of 8th International Conference on Software, Knowledge, Information Management and Applications (SKIMA), Dhaka, Bangladesh, IEEE XPLORE (pp. 1-7)http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=7083526.
Banshal, S. K, Uddin, A. and Singh, V. K. (2015), Identifying Themes and Trends in CS Research Output from India, In Proceedings of International Conference on Cognitive Computing and Information Processing (CCIP), Noida, India, IEEE XPLORE (pp. 1-6) http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7100742.
Accepted/Submitted:
Singh, V. K., Banshal, S. K., Singhal, K. & Uddin, A., A Sciento-Text Framework for Fine-grained Characterization of the Leading World Institutions in Computer Science Research, Accepted to appear in 15th International Conference on Scientometrics and Informetrics (ISSI), Istanbul, Turkey, 29th June-3rd July, 2015.
Singh, V. K., Banshal, S. K., Singhal, K. & Uddin, A., Identifying Area Specific Strong Research Centers in the Leading World Institutions in Computer Science Research, Submitted to Atlanta Conference On Science and Innovation Policy, Atlanta, USA, 17th Sept. - 19th Sept., 2015.
Banshal, S. K., Singhal, K., Uddin, A., & Singh, V. K, Scientometric Mapping of Research on ‘Big Data’, Submitted to Journal of Scientometrics ISSN: 0138-9130 (Print) 1588-2861 (Online); Impact Factor (2013) : 2.274.
Selected Bibliography
Geraci, M., & Degli Esposti, M. (2011). Where do Italian universities stand? An in-depth statistical analysis of national and international rankings.
Scientometrics, 87(3), 667-681.
Hirsch, J. (2005). An index to quantify an individual's scientific research output. Proceedings of the National academy of Sciences of the United States of
America, 102, 16569-16572.
Uddin, A., & Singh, V. K. (2014). Mapping the Computer Science Research in SAARC Countries. IETE Technical Review, 31, 287-296.
Uddin, A. & Singh, V.K. (2015). A Quantity-Quality Composite Ranking of Indian Institutions in Computer Science Research. IETE Technical Review
(forthcoming) DOI: http://dx.doi.org/10.1080/02564602.2015.1010614
Singhal K, Banshal SK, Uddin A, Singh VK. The information technology knowledge infrastructure and research in South Asia. J Sci Res 2014;3:134-42
Banshal SK, Singhal K, Uddin A, Singh VK. & Sharmin MF. Mapping the Computer Science Research in Bangladesh. Proceedings of the 8 th International
Conference on Software, Knowledge, Information Management and Applications, Dhaka, Bangladesh, IEEE Xplore; Dec, 2014
Liu, N. & Liu, L. (2005). University rankings in China. Higher Education in Europe, 30, 217-227.
Ma, R., Ni, C. & Qiu, J. (2008). Scientific research competitiveness of world universities in computer science. Scientometrics, 76, 245-260.
Uddin, A. & Singh, V.K. (2014) Measuring research output and collaboration in South Asian countries. Current Science 107, 1.
Prathap, G. (2010). The 100 most prolific economists using the p-index. Scientometrics, 84(1), 167-172.
Egghe, L. (2006). An improvement of the h-index: The g-index. ISSI newsletter, 2(1), 8-9.
Alonso, S., Cabrerizo, F. J., Herrera-Viedma, E. and Herrera, F. (2010). hg-index: A new index to characterize the scientific output of researchers
based on the h-and g-indices. Scientometrics, 82(2), 391-400.
Karpagam, R., Gopalakrishnan, S., Babu, B.R. and Natarajan, M. (2012). Scientometric Analysis of Stem cell Research: A comparative study of
India and other countries. Collnet Journal of Scientometrics and Information Management, 6(2), 229-252.
Karpagam, R., Gopalakrishnan, S., Natarajan, M., and Babu, B.R. (2011). Mapping of nanoscience and nanotechnology research in India: a
scientometric analysis, 1990–2009. Scientometrics, 89(2), 501-522.