1
- used classiers: Random Forest, SVM with radial and polynomial kernel - select specic publication subsets by tags - number of citations split into two classes using median in each publication set - baselines: + Majority (vote for the most frequent class in training set) and + Random (acc = 50%; random guessing) 50 60 70 inform comput manag languag analysi theori tag knowledg structur model commun process system evalu design network collabor softwar data learn web folksonomi social cluster web20 internet ontolog search semant algorithm tag stem classifier accuracy (%) classifier Baseline Majority Baseline Random RandomForest SVM radial kernel SVM polynomial kernel Classication accuracy for the 30 tag-induced publication sets for the three classiers and two baselines. Note: the diagrams for the two SVMs are almost indiscernible as they often provided the same result. user resource tags Scholarly Impact from a Social Bookmarking Perspective Daniel Zoller, Stephan Doerfel, Robert Jäschke, Gerd Stumme and Andreas Hotho Prediction of Future Citations Random Forest: - outperforms random baseline on 29 out of 30 datasets - outperforms majority baseline on 28 out of 30 datasets - sign test and Wilcoxon signed-rank test conrm signicant dierences SVM: - less successful than Random Forest on average - in 25 out of 30 cases better than random baseline (sign and Wilcoxon signed test corroborate signicant dierences) - on average: positive improvements over the majority baseline Correlations - social bookmarking tools allow users to annotate resources (e.g. photos, music, publications) with tags - in BibSonomy, users can store and retrieve publications and website bookmarks Social Bookmarking - fat-tailed distribution - similarly found in previous work - most of the publications in BibSonomy not cited by any other scientic work - publications with one citation are the second largest set - frequency decreases continuously with higher numbers of citations Citations Meet Social Data 10 1000 1 10 100 1000 10000 number of citations in MAS frequency The frequency distribution of the number of citations in Microsoft Academic Search to publications in BibSonomy - Citations from Microsoft Academic Search (crawled in 2014) - Matched to publications in BibSonomy Altmetrics - alternative metrics, based on Social Web - impact measures other than citation counts - e.g. tweets, mentions, likes, blog posts, etc. - in our case: based on Social Bookmarking usage post view exp exp req tag cit 1 0.644 0.638 0.633 0.446 0.330 0.181 view 0.322 1 0.725 0.705 0.656 0.322 0.091 exp 0.317 0.429 1 0.988 0.742 0.279 0.157 exp Bib 0.328 0.417 0.955 1 0.722 0.277 0.160 req 0.325 0.912 0.663 0.634 1 0.213 0.072 tag 0.277 0.272 0.237 0.242 0.267 1 0.036 cit 0.199 0.098 0.122 0.120 0.098 0.014 1 post Bib Correlation between dierent usage features in BibSonomy and the number of citations of a publication. Upper right triangle Person's r and lower left Spearman's ρ. - signicant (0.01 level) positive correlations between usage features and number of citations - noticeable bias between citations and both posting (post) and exporting (exp) - no real correlation between tag metric and citations (tag can occur in many posts) - apart from exp and exp Bib and req and view, none of the usage metrics is strongly correlated to another one => truly alternative metrics post a publication (post) Usage Metrics %0 Conference Paper %1 zoller2015scholarly %A Zoller, Daniel %A Doerfel, Stephan %A Jäschke, Robert %A Stumme, Gerd %A Hotho, Andreas %D 2015 %K bibsonomy bookmarking impact scholar social %T Scholarly Impact from a Social Bookmarking Perspective @inproceedings{zoller2015scholarly, added-at = {2015-06-23T15:50:29.000+0200}, author = {Zoller, Daniel and Doerfel, Stephan and Jäschke, Robert and Stumme, Gerd and Hotho, Andreas}, keywords = {bibsonomy bookmarking impact scholar social}, timestamp = {2015-06-23T18:00:50.000+0200}, title = {Scholarly Impact from a Social Bookmarking Perspective}, year = 2015 } BibTeX export into various formats, e.g. for citation (exp); only BibTeX (exp Bib ) search publications by tag (tag) view a publication post (view) all publication related requests (req) @inproceedings{zoller2015scholarly, added-at = {2015-06-23T15:50:29.000+0200}, author = {Zoller, Daniel and Doerfel, Stephan and Jäschke, Robert and Stumme, Gerd and Hotho, Andreas}, keywords = {bibsonomy bookmarking impact scholar social}, timestamp = {2015-06-23T18:00:50.000+0200}, title = {Scholarly Impact from a Social Bookmarking Perspective}, year = 2015 } BibTeX R e s e a r c h Q u e s t i o n : D o A l t m e t r i c s ( U s a g e M e t r i c s ) C o r r e l a t e w i t h C i t a t i o n s ?

Scholarly Impact from a Social Bookmarking Perspective · bert and Stu me, G d and Hotho , Andre s}, keywo s = {bibs n my bookmarki ng impact scholar social}, t mestamp = {2015-06-23

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Scholarly Impact from a Social Bookmarking Perspective · bert and Stu me, G d and Hotho , Andre s}, keywo s = {bibs n my bookmarki ng impact scholar social}, t mestamp = {2015-06-23

- used classifiers: Random Forest, SVM with radial and polynomial kernel - select specific publication subsets by tags - number of citations split into two classes using median in each publication set - baselines: + Majority (vote for the most frequent class in training set) and + Random (acc = 50%; random guessing)

50

60

70

info

rm

com

put

man

ag

lang

uag

anal

ysi

theo

ri

tag

know

ledg

stru

ctur

mod

el

com

mun

proc

ess

syst

em

eval

u

desig

n

netw

ork

colla

bor

softw

ar

data

lear

n

web

folks

onom

i

socia

l

clust

er

web2

0

inte

rnet

onto

log

sear

ch

sem

ant

algo

rithm

tag stem

class

ifier a

ccur

acy

(%)

classifier Baseline Majority Baseline Random RandomForest SVM radial kernel SVM polynomial kernel

Classification accuracy for the 30 tag-induced publication sets for the three classifiers and two baselines.Note: the diagrams for the two SVMs are almost indiscernible as they often provided the same result.

user

resource

tags

Scholarly Impact from a Social Bookmarking Perspective

Daniel Zoller, Stephan Doerfel, Robert Jäschke, Gerd Stumme and Andreas Hotho

Prediction of Future Citations

Random Forest: - outperforms random baseline on 29 out of 30 datasets - outperforms majority baseline on 28 out of 30 datasets - sign test and Wilcoxon signed-rank test confirm significant differences

SVM: - less successful than Random Forest on average - in 25 out of 30 cases better than random baseline (sign and Wilcoxon signed test corroborate significant differences) - on average: positive improvements over the majority baseline

Correlations

- social bookmarking tools allow users to annotate resources (e.g. photos, music, publications) with tags - in BibSonomy, users can store and retrieve publications and website bookmarks

Social Bookmarking

- fat-tailed distribution - similarly found in previous work - most of the publications in BibSonomy not cited by any other scientific work - publications with one citation are the second largest set - frequency decreases continuously with higher numbers of citations

Citations Meet Social Data

10

1000

1 10 100 1000 10000number of citations in MAS

frequ

ency

The frequency distribution of the number of citationsin Microsoft Academic Search to publications in BibSonomy

- Citations from Microsoft Academic Search (crawled in 2014) - Matched to publications in BibSonomy

Altmetrics - alternative metrics, based on Social Web - impact measures other than citation counts - e.g. tweets, mentions, likes, blog posts, etc. - in our case: based on Social Bookmarking usage

post view exp exp req tag cit

1 0.644 0.638 0.633 0.446 0.330 0.181view 0.322 1 0.725 0.705 0.656 0.322 0.091exp 0.317 0.429 1 0.988 0.742 0.279 0.157expBib 0.328 0.417 0.955 1 0.722 0.277 0.160req 0.325 0.912 0.663 0.634 1 0.213 0.072tag 0.277 0.272 0.237 0.242 0.267 1 0.036cit 0.199 0.098 0.122 0.120 0.098 0.014 1

postBib

Correlation between different usage features in BibSonomy and the number of citations of a publication.Upper right triangle Person's r and lower left Spearman's ρ.

- significant (0.01 level) positive correlations between usage features and number of citations - noticeable bias between citations and both posting (post) and exporting (exp) - no real correlation between tag metric and citations (tag can occur in many posts) - apart from exp and expBib and req and view, none of the usage metrics is strongly correlated to another one => truly alternative metrics

post a publication (post)

Usage Metrics

%0 Conference Paper

%1 zoller2015scholarly

%A Zoller, Daniel

%A Doerfel, Stephan

%A Jäschke, Robert

%A Stumme, Gerd

%A Hotho, Andreas

%D 2015

%K bibsonomy bookmarking impact scholar social

%T Scholarly Impact from a Social Bookmarking Perspective

@inproceedings{zoller2015scholarly, added-at = {2015-06-23T15:50:29.000+0200}, author = {Zoller, Daniel and Doerfel, Stephan and Jäschke, Robert and Stumme, Gerd and Hotho, Andreas}, keywords = {bibsonomy bookmarking impact scholar social}, timestamp = {2015-06-23T18:00:50.000+0200}, title = {Scholarly Impact from a Social Bookmarking Perspective}, year = 2015}

BibTeX

export into various formats, e.g.for citation (exp); only BibTeX (expBib)

search publications by tag (tag)

view a publication post (view)

all publication related requests (req)

@inproceedings{zoller2015scholarly, added-at = {2015-06-23T15:50:29.000+0200}, author = {Zoller, Daniel and Doerfel, Stephan and Jäschke, Robert and Stumme, Gerd and Hotho, Andreas}, keywords = {bibsonomy bookmarking impact scholar social}, timestamp = {2015-06-23T18:00:50.000+0200}, title = {Scholarly Impact from a Social Bookmarking Perspective}, year = 2015}

BibTeX

Research Question: Do Altmetrics (Usage Metrics) Correlate with Citations?