Upload
jean-bond
View
215
Download
2
Tags:
Embed Size (px)
Citation preview
EERQI Final Conference, Brussels, 15-16 March 2011
This project is funded by the Socioeconomic Sciences and Humanities Section.
Interrelations Of IndicatorsWork in Progress
Prof. Dr. Stefan Gradmann / Dr. Frank HavemannHumboldt-Universität zu Berlin / Berlin School of Library and Information Science (IBI)
Interrelations of Indicators / Stefan Gradmann, Frank HavemannEERQI Final Conference, Bruxelles 15-16 March 2011
2
Overview
• Base Data• Intrinsic Indicators: Interrelation• Extrinsic paper data from search engines and
social-network services• Citations in Google Scholar• Correlation of intrinsic total score with extrinsic
scores
Interrelations of Indicators / Stefan Gradmann, Frank HavemannEERQI Final Conference, Bruxelles 15-16 March 2011
3
Base Data
• Assessments of 179 papers based on intrinsic criteria• two files of extrinsic data:
– citation numbers of rated papers obtained with Google Scholar (on March 8, 2011)
– data from search engines and social-network services. • extrinsic author data suffer from homonymic authors → we only use
paper attributes. • Papers in English and in German distributed over three thematic
groups:
– Group 1 includes papers about "assessment, evaluation, testing & measurement" (35 / 35)
– group 2 about "comparative and inter-/multicultural education" (33 / 17)
– group 3 about "history and philosophy of education" (34 / 17)
Interrelations of Indicators / Stefan Gradmann, Frank HavemannEERQI Final Conference, Bruxelles 15-16 March 2011
4
Intrinsic Indicators: Interrelation
• Rigour ratings
– average of nine ratings of different aspects
• Originality ratings
– average of three ratings of different aspects
• Significance ratings
– average of four ratings of different aspects
• Combined rating score for each paper: the average ratings of all 16 aspects (total score on a scale from 0 to- 7).
• To do: weight the mean ratings of each paper with its number of ratings (→ we need all individual ratings by different persons that until now have not available).
• The scatterplots in the three figures of mean scores of rigour, originality, and significance show that the latter two correlate best, especially for English-language papers.
Interrelations of Indicators / Stefan Gradmann, Frank HavemannEERQI Final Conference, Bruxelles 15-16 March 2011
5
Originality – Rigour Interrelation
→ Lowest CorrelationStrength
Interrelations of Indicators / Stefan Gradmann, Frank HavemannEERQI Final Conference, Bruxelles 15-16 March 2011
6
Rigour - Significance Interrelation
→ Medium-low CorrelationStrength
Interrelations of Indicators / Stefan Gradmann, Frank HavemannEERQI Final Conference, Bruxelles 15-16 March 2011
7
Originality – Significance Interrelation
→ MaximumCorrelationStrength
Interrelations of Indicators / Stefan Gradmann, Frank HavemannEERQI Final Conference, Bruxelles 15-16 March 2011
8
Intrinsic Indicators: Distributions of Total Scores
• Box-and-whisker plots of distributions of total scores per language and group can be compared in the figures
• We display only distributions of rated papers which also have data from search engines or social-network services.
– The box in each plot contains 50 % of papers around the median (black horizontal line).
– The range of ratings are visualised by the "whiskers".
– Lonely points show outliers (which are more distant from the box than 1.5 times the box's height).
Interrelations of Indicators / Stefan Gradmann, Frank HavemannEERQI Final Conference, Bruxelles 15-16 March 2011
9
Intrinsic Indicators: Distributions of Total Scores
Interrelations of Indicators / Stefan Gradmann, Frank HavemannEERQI Final Conference, Bruxelles 15-16 March 2011
10
Extrinsic paper data from search engines and social-network services
• Sources:
– CiteULike, LibraryThing, MendReader
– Google, Metager
• Many papers have only hits in one service.
• To get useful data we apply the in-dubio-pro-reo rule and select maximum values.
• We assume that zero hits cannot be used as a valid value of an indicator and thus exclude papers without hits from the analysis.
• The hit distribution of papers with at least one hit is heavily skewed to the left: Many papers have only a few hits and only a few papers have many hits. We therefore use the logarithm of hit numbers as a more adequate representation.
Interrelations of Indicators / Stefan Gradmann, Frank HavemannEERQI Final Conference, Bruxelles 15-16 March 2011
11
Extrinsic paper data from social-network services
Interrelations of Indicators / Stefan Gradmann, Frank HavemannEERQI Final Conference, Bruxelles 15-16 March 2011
12
Extrinsic paper data from search engines: similar to social networks
Interrelations of Indicators / Stefan Gradmann, Frank HavemannEERQI Final Conference, Bruxelles 15-16 March 2011
13
Extrinsic paper data from search engines and social-network services
• All papers with social-network hits also have search engine hits. • Both hit numbers correlate
– quite well in each of the three groups for papers in English
– and less well for papers in German
Interrelations of Indicators / Stefan Gradmann, Frank HavemannEERQI Final Conference, Bruxelles 15-16 March 2011
14
Extrinsic paper data from search engines and social networks: English
Interrelations of Indicators / Stefan Gradmann, Frank HavemannEERQI Final Conference, Bruxelles 15-16 March 2011
15
Extrinsic paper data from search engines and social networks: German
Interrelations of Indicators / Stefan Gradmann, Frank HavemannEERQI Final Conference, Bruxelles 15-16 March 2011
16
Citations in Google Scholar: citation distributions for samples of the three groups
• Not all papers are listed in Google Scholar.
• Only a few papers in German are in the sample. We omit them.
• Here we use the y-scale of dual logarithms of numbers of citation + 1. The addition of 1 is a usual bibliometric method to include papers without citations into the analysis of log-values. It can be justified with the argument that publishing a new result is its first citation.
Interrelations of Indicators / Stefan Gradmann, Frank HavemannEERQI Final Conference, Bruxelles 15-16 March 2011
17
Citations in Google Scholar: The total scores (mean ratings)
• Note, that the first (red) group is rated best but cited worst (in contrast to the results for search engines and social-network services, where for papers in English ratings and hit numbers on the aggregated level of thematic groups seem to correlate).
• Extrinsic author data remain a to do: an effective method for disambiguating authors is needed first
Interrelations of Indicators / Stefan Gradmann, Frank HavemannEERQI Final Conference, Bruxelles 15-16 March 2011
18
Correlation?
Interrelations of Indicators / Stefan Gradmann, Frank HavemannEERQI Final Conference, Bruxelles 15-16 March 2011
19
Correlation of intrinsic total score with extrinsic scores
• In the case of hits in social networks and in search engines there is no correlation with intrinsic total score as the scatterplots show.
Interrelations of Indicators / Stefan Gradmann, Frank HavemannEERQI Final Conference, Bruxelles 15-16 March 2011
20
Correlation of intrinsic total score with extrinsic scores: social networks
Interrelations of Indicators / Stefan Gradmann, Frank HavemannEERQI Final Conference, Bruxelles 15-16 March 2011
21
Correlation of intrinsic total score with extrinsic scores: social networks
Interrelations of Indicators / Stefan Gradmann, Frank HavemannEERQI Final Conference, Bruxelles 15-16 March 2011
22
Correlation of intrinsic total score with extrinsic scores: search engine data
Interrelations of Indicators / Stefan Gradmann, Frank HavemannEERQI Final Conference, Bruxelles 15-16 March 2011
23
Correlation of intrinsic total score with extrinsic scores: search engine data
Interrelations of Indicators / Stefan Gradmann, Frank HavemannEERQI Final Conference, Bruxelles 15-16 March 2011
24
Correlation of intrinsic total score with extrinsic scores
• The same is true for citations of papers in English drawn from Google Scholar
Interrelations of Indicators / Stefan Gradmann, Frank HavemannEERQI Final Conference, Bruxelles 15-16 March 2011
25
Conclusion
• As a consequence, to do any correlation analysis (including rank correlation) of these intrinsic and extrinsic paper data does not make any sense ...
• … as long as such an analysis is based on paper attributes exclusively!
• → Effective author name disambiguation and disciplinary allocation is key
• Preliminary results do not yet invalidate the correlation methodology …
• … but they are revealing in terms of source data quality!• How to understand the variance among the sub-samples?