Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Introduction MESUR Mapping Science Metrics Survey Discussion
Tracking science in real-time from large-scaleusage data
Johan Bollen - [email protected]
Indiana UniversitySchool of Informatics and Computing
Center for Complex Networks and Systems ResearchCognitive Science Program
July 29, 2010
Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey DiscussionProblem statement Usage data
Outline
1 IntroductionProblem statementUsage data
2 MESURMESUR overviewCreating MESUR’s reference data set
3 Mapping Science
4 Metrics Survey
5 DiscussionOverviewFuture ResearchRelevant papers
Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey DiscussionProblem statement Usage data
Why study science?
Tremendous importance
Enormous amount of resources and people involved:
Allocation of resources: funding agencies, policy makers, thepublic, corporations, the scientists themselves
Social systems research: Ability to learn about general propertiesof similar social systems
Unrelated picture of my daughter
Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey DiscussionProblem statement Usage data
Science as a social system
Actors: scientists, stakeholders
Relations (often focused onexchange of knowledge)
Informal interactions:meetings, workshops,conversations, messages, etcFormal: affiliations,collaborations, projects,publications
Artifacts: articles, journals,reports, data, software
Xiaoming Liu, Johan Bollen, Michael L. Nelson, andHerbert Van de Sompel. Co-authorship networks inthe digital library research community. InformationProcessing and Management, 41(6):14621480,December 2005
Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey DiscussionProblem statement Usage data
The measure of science in the print-eraA word on the traditional way
Gold standard: citation data
Extracted from “publication” data.
From print comes citation:
Science is a gift-economySteal text, but not ideas.Currency is acknowledgementof influence: citationsMore citations, more influenceTrack scientific activity fromcitation data
Citations are derived frompeer-reviewed publications
Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey DiscussionProblem statement Usage data
The measure of science in the print-eraA word on the traditional way
Gold standard: citation data
Extracted from “publication” data.
From print comes citation:
Science is a gift-economySteal text, but not ideas.Currency is acknowledgementof influence: citationsMore citations, more influenceTrack scientific activity fromcitation data
Citations are derived frompeer-reviewed publications
Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey DiscussionProblem statement Usage data
The measure of science in the print-eraA word on the traditional way
Gold standard: citation data
Extracted from “publication” data.
From print comes citation:
Science is a gift-economySteal text, but not ideas.Currency is acknowledgementof influence: citationsMore citations, more influenceTrack scientific activity fromcitation data
Citations are derived frompeer-reviewed publications
Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey DiscussionProblem statement Usage data
From this springs an entire industry
Applications of citation data
End-user services and bibliometrics
Web of Science, Scopus, publishers, institutions, etc.
Citation-based services
Reference linkingVariety of end-user services
Citation-based assessment and bibliometrics
Impact metricsAnalyticsMapping
Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey DiscussionProblem statement Usage data
From this springs an entire industry
Applications of citation data
End-user services and bibliometrics
Web of Science, Scopus, publishers, institutions, etc.
Citation-based services
Reference linkingVariety of end-user services
Citation-based assessment and bibliometrics
Impact metricsAnalyticsMapping
Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey DiscussionProblem statement Usage data
From this springs an entire industry
Applications of citation data
End-user services and bibliometrics
Web of Science, Scopus, publishers, institutions, etc.
Citation-based services
Reference linkingVariety of end-user services
Citation-based assessment and bibliometrics
Impact metricsAnalyticsMapping
Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey DiscussionProblem statement Usage data
Two problems with present citation-based approachData (1) and metrics (2)
Issues:
1 Inherent features of citation data due to its origins inpublished material
2 Existing metrics fail to acknowledge (1), and ignore networkproperties of science as a social system.
Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey DiscussionProblem statement Usage data
Citation dataDelays and partiality
Citation problem 1: Data
Citation data is late and partial indicator of scientific activity
discovery research publication discovery research publication
citation
time 0 year +0.5 year +1.5 years +3.5 years+2 years +2.5 years
citation data
?
Publication delays
Publicationt → publicationt−1
→ citation DB
Domain-dependent practices
Community: Publishing authorsonly
Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey DiscussionProblem statement Usage data
Citation dataMetrics vs. networks
Citation problem 2: networks
Ignoring network properties of science
Livestock
Food
Chemistry
Material Sciences
Physics
Computer Science
Geology
Sports Medicine
ObGyn
Medicine
Resource Management
Dermatology
!"#$%&'(%)
*$+&,-%*$+&,!"#$.,-%-,+%&%*$+&,-%$,/
0%!"#$%/"'.%*
0%,&1%/"'.
0%!"#$%/"'.%)
!"#$%&'(%2
*$+&,!"#$%0
0%)3,4%/"'.
)3,/"'.3$+&#
!"#$%&'(%'
0%"31"%'-'&1#%!"#$
3-,&1%/"'.
!"#$%&'(%4'++
,&1*-,.'+*443/$
!"#$%&'(%*
+'+&*"'2&,-
0%/"'.%!"#$
0%*!!4%!"#$
*$+&,-%0
*-1'5%/"'.%3-+%'23+
+'+&*"'2&,-%4'++
!,4#.'&
,&1%4'++
0%!"#$!/,-2'-$%.*+
/"'.%&'(
0%(3&,4
/"'.%!"#$%4'++
!"#$%&'(%/
)3,/"'.%)3,!"%&'$%/,
0%*.%/"'.%$,//"'.!'6&%0
0%3..6-,4
!%$,/%!",+,!,!+%3-$
*--%1',!"#$!*+.%"#2&
1',!"#$%&'$%4'++
*$+&%$,/%!
!"#$%74632$
!%-*+4%*/*2%$/3%6$*
*!!4%!"#$%4'++
-6/4%!"#$%)
!"#$%/"'.%/"'.%!"#$
0%1',!"#$%&'$
.,4%!"#$
!"#$%4'++%)
'6&%0%,&1%/"'.
.,4%.3/&,)3,4
/"'.%/,..6-
*3!%/,-7%!
$6&7%$/3
0%,&1*-,.'+%/"'.
4*-1.63&
4'/+%-,+'$%/,.!6+%$/
/"'.%!"#$
*!!4%/*+*4%*!1'-
3-+%0%86*-+6.%/"'.
+"',/"'.!0%.,4%$+&6/
)3,/"'.%0
*/+*%/&#$+*44,1&%'
-'6&,$/3'-/'
$#-4'++
,&1%)3,.,4%/"'.
0%!"#$%)!*+%.,4%,!+
(3&,4,1#
3-,&1%/"3.%*/+*
2*4+,-%+
0%-'6&,!"#$3,4
!"#$%!4*$.*$
!"#$3/*%)
)3,!"#$%0
0%.,4%)3,4
*!!4%'-(3&,-%.3/&,)
$#-+"'$3$!$+6++1*&+
'*&+"%*-2%!4*-'+*&#%$/3'-/'%4'++'&$
0%/,.!%-'6&,4
0%!,4#.%$/3%!,4%/"'.
.*+%$/3%'-1%*!$+&6/+
1',!"#$%0%3-+
.,4%/'44%)3,4
0%-'6&,$/3
-6/4'3/%*/32$%&'$
3-+%0%.,2%!"#$%*
0%/"&,.*+,1&%*
.*/&,.,4'/64'$
2'(%)3,4
'6&%0%3-,&1%/"'.
7')$%0
/,,&23-%/"'.%&'(
0%)*/+'&3,4
/43-%,&+",!%&'4*+%&
+"'&3,1'-,4,1#
*-*4%/"3.%*/+*
*--%+",&*/%$6&1
/43-%/*-/'&%&'$
!"#$%4'++%*
*.%0%!"#$3,4!"'*&+%/
*/+*%.*+'&0*!*-'$'%0,6&-*4%,7%*!!43'2%!"#$3/$
'4'/+&,/"3.%*/+*
'6&%!"#$%0%)
*&/"%)3,/"'.%)3,!"#$
7')$%4'++
3-+%0%/*&23,4
.3/&,)3,4!69
/"'$+
*.%0%/*&23,4
/'44%.,4%437'%$/3
3-2%'-1%/"'.%&'$
/43-%-'6&,!"#$3,4
$,432%$+*+'%/,..6-
/4*$$3/*4%*-2%86*-+6.%1&*(3+#
/"'.%1',4
'6&%0%-'6&,$/3
*$+&,!"#$%0%$6!!4%$
+'+&*"'2&,-!*$#..'+&.,4%)3,4%/'44
/*-%0%/"'.
0%*.%/,44%/*&23,4
0%),-'%0,3-+%$6&1%*.
0%/,44,32%3-+'&7%$/3
-6/4%!"#$%*
0%*!!4%!,64+&#%&'$
!&,+'3-$
(*2,$'%:,-'%0
0%!"#$%*!.*+"%1'-
0%1'-%(3&,4
.,-%5'*+"'&%&'(
0%/43-%,-/,4
0%*&+"&,!4*$+#
"6.%&'!&,2
-6/4%2*+*%$"''+$
2'('4,!.'-+
/*+*4%+,2*#
0%'6&%/'&*.%$,/
,-/,1'-'
-6/4%!"#$%)!!&,/%$6!
3-+%0%.,2%!"#$%)!"#$%$+*+6$%$,4323%)
.*&%'/,4!!&,1%$'&
/*-/'&%&'$
+"3-%$,432%734.$
3-7'/+%3..6-
)&*3-%&'$
$!'/+&,/"3.%*/+*%*
!,4#"'2&,-
0%),-'%0,3-+%$6&1%)&
)4,,2
)%/"'.%$,/%0!-
0%/'44%$/3
*$+&,!"#$%$!*/'%$/3
!"#$%&'!
-'6&,$6&1'&#
+'/+,-,!"#$3/$
'6&%0%*!!4%!"#$3,4
0%*!!4%!,4#.%$/3
3/*&6$
'6&%!"#$%0%/
+*4*-+*
0%.,4%/*+*4%*!/"'.
/"'.,$!"'&'
0%!",+,/"%!",+,)3,%*
$!*/'%$/3%&'(
.,4%'/,4
/"'.%.*+'&
2'(%2#-*.
0%"#2&,4
'6&%!"#$%0%*
*.%0%&,'-+1'-,4
';!%)&*3-%&'$
)%$'3$.,4%$,/%*.
,'/,4,13*
0%!,4#.%$/3%!,4%!"#$
0%/&#$+%1&,5+"
$!3-'
1',/"3.%/,$.,/"3.%*/
0%-'6&,/"'.
0%*+.,$%$,4!+'&&%!"#
,39,$
/,44,32%$6&7*/'%*
!"#$3/*%/
&'(%)&*$%:,,+'/-
/"'.%'-1%$/3
0%/43.*+'
)3,,&1*-%.'2%/"'.
'6&,!"#$%4'++
*-*4%)3,*-*4%/"'.
/3&/64*+3,-
0%/,.!6+%/"'.
1',/"'.%1',!"#%1',$#
0%.,4%$+&6/+
0%-*+%!&,26/+$
!6)4%*$+&,-%$,/%!*/
'-2,/&3-,4,1#
.'+*44%.*+'&%+&*-$%*
9'#%'-1%.*+'&
0%.'2%/"'.
0%';!%),+
0%*!!4%!"#$3,40%*1&%7,,2%/"'.
*.%0%&'$!%/&3+%/*&'
3-('$+%,!"+"%(3$%$/3
,!"+"*4.,4,1#
1'-'+3/$
$6&1%-'6&,4
7,&'$+%'/,4%.*-*1
.'*+%$/3
0%-'6&,$6&1.*+'&%$/3%7,&6.
0%*44,#%/,.!2
0%.*1-%.*1-%.*+'&
*.%0%,)$+'+%1#-'/,4
"'4(%/"3.%*/+*
*+.,$%'-(3&,-
'4'/+&,*-*4
*.%"'*&+%0
$+62%$6&7%$/3%/*+*4
,!+%';!&'$$
$/3'-/'
7'.$%.3/&,)3,4%4'++
$#-+"'+3/%/,..6-
74632%!"*$'%'86343)&
-6/4%3-$+&6.%.'+"%)
'6&%&'$!3&%0
8%0%&,#%.'+',&%$,/
!"#$3/*%*
'6&%"'*&+%0
0%$!''/"%4*-1%"'*&%&
:%*-,&1%*441%/"'.
-*+6&'
/43-%3-7'/+%23$
*-+3/*-/'&%&'$
0%!"#$%2%*!!4%!"#$
0%.*+'&%$/3
)'"*(%)&*3-%&'$
0%*.%*/*2%2'&.*+,4
,)$+'+%1#-'/,4
!$#/",!"*&.*/,4,1#
*.%0%,!"+"*4.,4
.,2%!"#$%4'++%*
*/+*%",&+3/
0%$+&6/+%1',4
0%!'+&,4
*--6%&'(%*$+&,-%*$+&
0%.,4%$!'/+&,$/
"#2&,4%!&,/'$$
.*/&,.,4%/"'.%!"#$
0%!"#$%1%-6/4%!*&+3/
*!!4%$6&7%$/3
7,,2%/"'.
&'(%.,2%!"#$
-6/4%3-$+&6.%.'+"%*
!"*&.*/,4%)3,/"'.%)'
5*+'&%'-(3&,-%&'$
0%*+.,$%$/3
*&+"&,$/,!#
'6&%0%3..6-,4
.'+",2%'-:#.,4
$,4%!"#$
7'&+34%$+'&34
)3,,&1%.'2%/"'.%4'++
.'2%$/3%$!,&+%';'&
0%(,4/*-,4%1',+"%&'$
!4*-'+%$!*/'%$/3
!&,+'3-%$/3
'6&%0%!"*&.*/,4
'-(3&,-%$/3%+'/"-,4
*.%0%$!,&+%.'2
",&+$/3'-/'
'/,4,1#
0%/,$.,4%*$+&,!*&+%!
0%!"#$3,4!4,-2,-
!"#$%*+,.%-6/4<
0%'4'/+&,/"'.%$,/
*-*4%/"'.
)&3+%0%,!"+"*4.,4
"#2&,)3,4,13*
*-*4%)3,/"'.
)3,3-7,&.*+3/$
,!+%'-1
3-+%0%&'.,+'%$'-$
1#-'/,4%,-/,4
,!+%/,..6-
*$3*-%*6$+&*4%0%*-3.
0%'4'/+&,*-*4%/"'.
'(,46+3,-
0%!"#$%/"'.%$,432$
0%+",&*/%/*&23,(%$6&
/&,!%$/3
0%';!%.*&%)3,4%'/,4
3-+%0%&*23*+%,-/,4
43+",$
!4*-+*
0%!"*&.*/,4%';!%+"'&
-'6&,$/3%4'++
932-'#%3-+
!'23*+&3/$
)3,4%&'!&,2
0%/43-%'-2,/&%.'+*)
3..6-,4%&'(
/,/"&*-'%2)%$#$+%&'(
0%$,6-2%(3)
';!%/'44%&'$
3-+%0%"'*+%.*$$%+&*-
!4*-+%!"#$3,4
.*&%)3,4
3-06&#
0%!"#$%,/'*-,1&
0%74632%.'/"
0%/43-%.3/&,)3,4
0%-,-!/&#$+%$,432$
0%!'23*+&%,&+",!'2
+"&,.)%"*'.,$+*$3$
(*//3-'
0%!'23*+&%,&+",!%)
0%/'44%)3,41'-'%2'(
0%,!+%$,/%*.%*
23*)'+'$%/*&'
23*)'+'$
0*.*!0%*.%.'2%*$$,/
+%*.%73$"%$,/
1'-'
)&3+%0%2'&.*+,4
!4*-+%0
&*23,4,1#
.3/&,!,&%.'$,!,&%.*+
*-3.%&'!&,2%$/3
0%/*+*4
0%*/,6$+%$,/%*.
*&/"%(3&,4
5*+'&%&'$
-'6&,-
/*-/'&
0%$/3%7,,2%*1&
0%*.%/'&*.%$,/
*3/"'%0
.,4%)3,4%'(,4
*!!4%,!+3/$
(3$3,-%&'$
*&+'&3,$/4%+"&,.%(*$
/,-+&3)%.3-'&*4%!'+&
'.),%0*-+3.3/&,)%*1'-+$%/"
+"',&%/,.!6+%$/3
*2(%.*+'&
!4*$.*%!"#$%/,-+&%7
0%7,,2%$/3
*-3.%)'"*(
&'.,+'%$'-$%'-(3&,-
0%2*3&#%$/3
!4*-+%*-2%$,34
4*-/'+
0%';!%+"',&%!"#$<
!'&/'!+%!$#/",!"#$
*&/"%,!"+"*4.,4!/"3/
*.%-*+
0%.*+"%!"#$
0%';!%.'2
1',4,1#
*/+*%/&#$+*44,1&%2
0%431"+5*('%+'/"-,4
!4*-+%/'44
5*+'&%&'$,6&%&'$
/*-%0%73$"%*86*+%$/3
0%/43-%3-('$+
,!+%4'++
/*-%0%7,&'$+%&'$
*.%.3-'&*4
!,64+&#%$/3
/'44
$,34%$/3%$,/%*.%0
*&/"%2'&.*+,4
!"#$3/*4%&'(3'5
0%!"#$%/"'.!6$
/,..6-%*/.
$"'4;$=>%$"'4;4=>
6-!6)
0%*$$,/%/,.!6+%.*/"
*/+*%.'+*44%.*+'&
0%!"#$%/%$,432%$+*+'
)3,/"3.%)3,!"#$%*/+*
0%&'!&,2%7'&+34
/3&/%&'$
*/+*%/&#$+*44,1&%*
*.%&'(%&'$!3&%23$
'4'/+&,'-%/43-%-'6&,
0%*!!4%/&#$+*44,1&
3'''%0%86*-+6.%'4'/+
'6&%0%)3,/"'.
$3*.%0%/,.!6+
$*2*)$
3'''%+%3-7,&.%+"',&#
.6+*+%&'$
0%*-3.%$/3
-'5%'-14%0%.'2
!"#+,/"'.3$+&#
'4'/+&,-%4'++
0%/"'.%$,/%!'&9%+%?
:%!"#$%/%!*&+%73'42$
0%3-7'/+%23$
/,..6-3/*+3,-
1*$+&,'-+'&,4,1#
0%*.%$,/%",&+3/%$/3
-6/4%76$3,-
43.-,4%,/'*-,1&
*--%!"#$!-'5%#,&9
/"'.%)'&
*.%0%/43-%-6+&
/,..6-%.*+"%!"#$
0%-*+4%/*-/'&%3
XXz
XXXz
bad
good
Measuring impact, influence,prestige from citation data:
More citations is betterthan less citations
In citation network,central position is betterthan outskirts.
Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey DiscussionProblem statement Usage data
Citation problem no. 2Metrics vs. networks
This approach is epitomized in Thomson-Reuters Impact Factor:
Standard citation graph representation
G = (V ,E ,W )E ⊆ V 2
W : E → N+
However:∀(ni , nj) ∈ E : pubtime(ni ) > pubtime(ni )
Impact Factor: normalized in-degree
IFj =P
i wij
Nj
Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey DiscussionProblem statement Usage data
Citation problem no. 2Metrics vs. networks
This approach is epitomized in Thomson-Reuters Impact Factor:
Standard citation graph representation
G = (V ,E ,W )E ⊆ V 2
W : E → N+
However:∀(ni , nj) ∈ E : pubtime(ni ) > pubtime(ni )
Impact Factor: normalized in-degree
IFj =P
i wij
Nj
Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey DiscussionProblem statement Usage data
Citation problem no. 2Metrics vs. networks
This approach is epitomized in Thomson-Reuters Impact Factor:
Standard citation graph representation
G = (V ,E ,W )E ⊆ V 2
W : E → N+
However:∀(ni , nj) ∈ E : pubtime(ni ) > pubtime(ni )
Impact Factor: normalized in-degree
IFj =P
i wij
Nj
Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey DiscussionProblem statement Usage data
Citation problem no. 2Between Big Star and Britney Spears
Is more always better?
Why context matters
Sold 50M records
Influenced who?
High popularity, lowprestige.
Sold 50k records
Influenced REM, B52s,Teenage Fanclub, etc.
Low popularity, highinfluence
Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey DiscussionProblem statement Usage data
Citation problem no. 2Between Big Star and Britney Spears
Is more always better?
Why context matters
Sold 50M records
Influenced who?
High popularity, lowprestige.
Sold 50k records
Influenced REM, B52s,Teenage Fanclub, etc.
Low popularity, highinfluence
Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey DiscussionProblem statement Usage data
Citation problem no. 2Between Big Star and Britney Spears
Is more always better?
Why context matters
Sold 50M records
Influenced who?
High popularity, lowprestige.
Sold 50k records
Influenced REM, B52s,Teenage Fanclub, etc.
Low popularity, highinfluence
Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey DiscussionProblem statement Usage data
Citation problem no. 2Between Big Star and Britney Spears
Is more always better?
Why context matters
Sold 50M records
Influenced who?
High popularity, lowprestige.
Sold 50k records
Influenced REM, B52s,Teenage Fanclub, etc.
Low popularity, highinfluence
Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey DiscussionProblem statement Usage data
Citation problem no. 2Between Big Star and Britney Spears
Silly? That’s how we evaluate scientific impact right now!
The Impact Factor and lots of other citation-based metrics arepresently being used to assess the quality of publications, journals,authors, institutions, and even entire countries by proxy.
Note: Lots of developments on better metrics (cf. Eigenfactor:Bergstrom and Rosvall), but still reliance on citation data.
Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey DiscussionProblem statement Usage data
New developmentsPageRank for citation graphs
v2
v3
15
v1 505
4
1
2
3
60
20
4010
20
?
?
20
Impact Factor: IFj =P
i wij
Nj:
normalized citation countorigin of citation disregarded
A different route:
IF (vi , ) ' λ∑
j IFj
IF (vi , ) ' λ∑
j IFj × 1O(vj )
PR(vi ) ' λ∑
j PR(vj)× 1O(vj )
PR(vi ) ' (1−λ)N + λ
∑j PR(vj)× 1
O(vj )
PRw (vi ) = (1−λ)N + λ
∑j PRw (vj)×w(vj , vi )
Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey DiscussionProblem statement Usage data
New developments, contdPageRank for citation graphs
ISI IF PRw
rank value Journal value (x 103) Journal
1 52.28 ANNU REV IMMUNOL 16.78 NATURE
2 37.65 ANNU REV BIOCHEM 16.39 J BIOL CHEM
3 36.83 PHYSIOL REV 16.38 SCIENCE
4 35.04 NAT REV MOL CELL BIO 14.49 P NATL ACAD SCI USA
5 34.83 NEW ENGL J MED 8.41 PHYS REV LETT
6 30.98 NATURE 5.76 CELL
7 30.55 NAT MED 5.70 NEW ENGL J MED
8 29.78 SCIENCE 4.67 J AM CHEM SOC
9 28.18 NAT IMMUNOL 4.46 J IMMUNOL
10 28.17 REV MOD PHYS 4.28 APPL PHYS LETT
Table: The highest ranking journals according to ISI IF and WeightedPageRank (JCR2003)
Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey DiscussionProblem statement Usage data
Two problems with present citation-based approachData (1) and metrics (2)
Issues:
1 Inherent features of citation data due to its origins inpublished material
2 Existing metrics fail to acknowledge (1), and ignore networkproperties of science as a social system.
Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey DiscussionProblem statement Usage data
Post-print, online era, aka known as “today”: usage data?
Most use of scholarly resources is now mediated by online services
discovery research publication discovery research publication
citation
time 0 year +0.5 year +1.5 years +3.5 years+2 years +2.5 years
usage datasearch queries
text data citation data
early indicators late indicators
Recorded by nearly everyscholarly service
Captures early phases ofscientific activity
Includes a wide variety ofresource types
Activities of larger scientificcommunities, e.g. not limitedonly to authors
Scale: cf. Elsevier announced1B downloads in 2006 vs. 650Mcitation in WoS.
Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey DiscussionProblem statement Usage data
Issues with usage data
Lots of interest in leveraging usage data for models of science,tracking scientific activity, metrics, etc. Cf. COUNTER, Ex LibrisbX, and many more.
However many different issues:
Silo: recorded for particular servicefor a particular user community
Lack of standards: recorded indifferent manner, for differentsystems
Lack of research: how leverageusage data for scientific tracking,modeling, metrics, etc.
Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey DiscussionMESUR overview Creating MESUR’s reference data set
Outline
1 IntroductionProblem statementUsage data
2 MESURMESUR overviewCreating MESUR’s reference data set
3 Mapping Science
4 Metrics Survey
5 DiscussionOverviewFuture ResearchRelevant papers
Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey DiscussionMESUR overview Creating MESUR’s reference data set
MESUR project: survey the potential of usage data at verylarge-scale
Studying scientific activity
Can we model and study patterns of scientific activity fromlarge-scale, representative usage data?
What can we learn about impact and structure from patternsof scientific activity from usage data?
Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey DiscussionMESUR overview Creating MESUR’s reference data set
MESUR project: survey the potential of usage data at verylarge-scale
Studying scientific activity
Can we model and study patterns of scientific activity fromlarge-scale, representative usage data?
What can we learn about impact and structure from patternsof scientific activity from usage data?
Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey DiscussionMESUR overview Creating MESUR’s reference data set
MESUR project: survey the potential of usage data at verylarge-scale
Studying scientific activity
Can we model and study patterns of scientific activity fromlarge-scale, representative usage data?
What can we learn about impact and structure from patternsof scientific activity from usage data?
Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey DiscussionMESUR overview Creating MESUR’s reference data set
MESUR history
2006-2008: Andrew W. MellonFoundation
2008-2009: Los AlamosNational Laboratory
2009-: Indiana University,School of Informatics andComputing
2009-2013: National ScienceFoundation
Team: PI, co-PI, 2 scientists,2 full-time developers, and 1PhD student.
Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey DiscussionMESUR overview Creating MESUR’s reference data set
MESUR history
2006-2008: Andrew W. MellonFoundation
2008-2009: Los AlamosNational Laboratory
2009-: Indiana University,School of Informatics andComputing
2009-2013: National ScienceFoundation
Team: PI, co-PI, 2 scientists,2 full-time developers, and 1PhD student.
Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey DiscussionMESUR overview Creating MESUR’s reference data set
MESUR history
2006-2008: Andrew W. MellonFoundation
2008-2009: Los AlamosNational Laboratory
2009-: Indiana University,School of Informatics andComputing
2009-2013: National ScienceFoundation
Team: PI, co-PI, 2 scientists,2 full-time developers, and 1PhD student.
Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey DiscussionMESUR overview Creating MESUR’s reference data set
MESUR history
2006-2008: Andrew W. MellonFoundation
2008-2009: Los AlamosNational Laboratory
2009-: Indiana University,School of Informatics andComputing
2009-2013: National ScienceFoundation
Team: PI, co-PI, 2 scientists,2 full-time developers, and 1PhD student.
Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey DiscussionMESUR overview Creating MESUR’s reference data set
MESUR history
2006-2008: Andrew W. MellonFoundation
2008-2009: Los AlamosNational Laboratory
2009-: Indiana University,School of Informatics andComputing
2009-2013: National ScienceFoundation
Team: PI, co-PI, 2 scientists,2 full-time developers, and 1PhD student.
Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey DiscussionMESUR overview Creating MESUR’s reference data set
MESUR history
2006-2008: Andrew W. MellonFoundation
2008-2009: Los AlamosNational Laboratory
2009-: Indiana University,School of Informatics andComputing
2009-2013: National ScienceFoundation
Team: PI, co-PI, 2 scientists,2 full-time developers, and 1PhD student.
Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey DiscussionMESUR overview Creating MESUR’s reference data set
MESUR objective
Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey DiscussionMESUR overview Creating MESUR’s reference data set
MESUR dataflow
MESURreference data set
Data providers
...
Publishers
Aggregators
Filtering & deduplication
Usagedata
Usagedate
Usagedate
...
ingestion
Parser 1
Parser 2
Parser 3
MESURontology
institutions
impactmetrics
Models
Models
Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey DiscussionMESUR overview Creating MESUR’s reference data set
Creating MESUR’s data set
MESURreference data set
Data providers
...
Publishers
Aggregators
Filtering & deduplication
Usagedata
Usagedate
Usagedate
...
ingestion
Parser 1
Parser 2
Parser 3
MESURontology
institutions
impactmetrics
Models
Models
0.0e
+00
5.0e
+06
1.0e
+07
1.5e
+07
date
num
ber o
f eve
nts
2004 2005 2006 2007 2008 2009
Providers 2006-20010: BMC, Blackwell, UC, CSU (23),EBSCO, ELSEVIER, EMERALD, INGENTA, JSTOR,LANL, MIMAS/ZETOC, THOMSON, UPENN (9),UTEXAS
events 1,000,000,000 usage events
citations +500,000,000 citations,
articles and serials +50M articles, +-100,000 serials
Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey DiscussionMESUR overview Creating MESUR’s reference data set
Data requirements
Minimum requirements
Separate user requests
Date-time stamp down to second
Document identifier or sufficient metadata to de-duplicate
Session ID (or anonymized user ID)
Request type identifier
http://tweety.lanl.gov/public/schemas/2007-01/mesur.owl
Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey DiscussionMESUR overview Creating MESUR’s reference data set
MESUR’s OWL/RDF ontology
1
1Rodriguez, Bollen & Van de Sompel. A Practical Ontology for the Large-Scale Modeling of Scholarly
Artifacts and their Usage. JCDL07 - Based on OntologyX work.
Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
Outline
1 IntroductionProblem statementUsage data
2 MESURMESUR overviewCreating MESUR’s reference data set
3 Mapping Science
4 Metrics Survey
5 DiscussionOverviewFuture ResearchRelevant papers
Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
Reference data set
Subsetting
Common time period March 1st 2006 - February 1st 2007
Providers: Thomson Scientic (Web of Science), Elsevier(Scopus), JSTOR, Ingenta, University of Texas (9campuses, 6 health institutions), and California StateUniversity (23 campuses)
Scale: 346,312,045 usage events, 97,532 serials (many ofwhich not journals)
Domain Usage UC Degrees JCR
Natural Science 37% 39% 92.8%Social Sciences 45% 46% 7.2%
Humanities 14% 15%
Source: http://www.ucop.edu/ucophome/uwnews/stat/statsum/fall2007/statsumm2007.pdf (table 9)
Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
Reference data set
Subsetting
Common time period March 1st 2006 - February 1st 2007
Providers: Thomson Scientic (Web of Science), Elsevier(Scopus), JSTOR, Ingenta, University of Texas (9campuses, 6 health institutions), and California StateUniversity (23 campuses)
Scale: 346,312,045 usage events, 97,532 serials (many ofwhich not journals)
Domain Usage UC Degrees JCR
Natural Science 37% 39% 92.8%Social Sciences 45% 46% 7.2%
Humanities 14% 15%
Source: http://www.ucop.edu/ucophome/uwnews/stat/statsum/fall2007/statsumm2007.pdf (table 9)
Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
Reference data set
Subsetting
Common time period March 1st 2006 - February 1st 2007
Providers: Thomson Scientic (Web of Science), Elsevier(Scopus), JSTOR, Ingenta, University of Texas (9campuses, 6 health institutions), and California StateUniversity (23 campuses)
Scale: 346,312,045 usage events, 97,532 serials (many ofwhich not journals)
Domain Usage UC Degrees JCR
Natural Science 37% 39% 92.8%Social Sciences 45% 46% 7.2%
Humanities 14% 15%
Source: http://www.ucop.edu/ucophome/uwnews/stat/statsum/fall2007/statsumm2007.pdf (table 9)
Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
Reference data set
Subsetting
Common time period March 1st 2006 - February 1st 2007
Providers: Thomson Scientic (Web of Science), Elsevier(Scopus), JSTOR, Ingenta, University of Texas (9campuses, 6 health institutions), and California StateUniversity (23 campuses)
Scale: 346,312,045 usage events, 97,532 serials (many ofwhich not journals)
Domain Usage UC Degrees JCR
Natural Science 37% 39% 92.8%Social Sciences 45% 46% 7.2%
Humanities 14% 15%
Source: http://www.ucop.edu/ucophome/uwnews/stat/statsum/fall2007/statsumm2007.pdf (table 9)
Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
Article clickstreams
usage data log: U = {u1, u2, · · · , un}u = {s, t, a}s = session identifier , t = a date-time and a = articlef ∈ F = clickstreams extracted from Uf ⊂ U, wheref = (∀u ∈ U, ∃s : s(u) ∧ t(ui ) < t(ui+1))s(u) and t(u): session identifier and date-time of interaction u.
v1
a1
v2
a2 a3
v3
a4
v1
a1
v3
a4 articles
onlineuser interactions
journals
clickstream c1
i2i1i4i1 i2 i3
time
session1 session2
clickstream c2
Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
Journal Clickstreams
article clickstream: fa = (a1, a2, · · · , ak)journal clickstream: fv = (v1, v2, · · · , vk)We observe: N(vi , vj)for all pairs (vi , vj) in which j = i + 1From which follows:
P(vi , vj) =N(vi , vj)∑j N(vi , vj)
andM whose entries mi ,j = P(vi , vj).
v1
v4
v3
v2
15
5
30
v1
v4
v3
v2
0.3
0.1
0.6
Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
Examples of prominent connections
vi vj p(vi , vj ) N(vi , vj ) N(vi )
American Journal of International Law International Organization 0.0207 9,292 448,034International Affairs 0.0184 8,254International and Comparative Law Quarterly 0.0171 7,654Foreign Policy 0.0167 7,500American Political Science Association 0.0140 6,291
Journal of Educational Sociology American Journal of Sociology 0.0334 2,790 83,419Journal of Higher Education 0.0303 2,529Journal of Negro Education 0.0286 2,389American Sociological Review 0.0276 2,303Social Forces 0.0249 2,076
Surface Science Physical Review B 0.0704 2,555 36,282Applied Surface Science 0.0341 1,239Physical Review Letters 0.0339 1,230Journal of Chemical Physics 0.0333 1,207Applied Physics Letters 0.0327 1,188
Journal of Organic Chemistry Journal of the American Chemical Society 0.0873 4,141 47,439Tetrahedron Letters 0.0865 4,105Tetrahedron 0.0602 2,857Organic Letters 0.0532 2,526Angewandte Chemie 0.0305 1,448
Ecological Applications Ecology 0.0965 13,659 141,481Conservation Biology 0.0524 7,408Bioscience 0.0215 3,043Annual Review of Ecology and Systematics 0.0215 3,043Clinical and Experimental Allergy 0.0191 2,699
Annals of Mathematics American Journal of Mathematics 0.0705 5,392 76,526American Mathematical Monthly 0.0579 4,432PNAS 0.0156 1,195Econometrica 0.0082 624Mathematics Magazine 0.0077 587
Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
Network parameters
Network matrix
Parameter M M ′
Journals 97,532 2,307Edges 6,783,552 50,000
Matrix density 0.071% 0.939%Strongly Connected Components (SCC) 16,474 236
Journals in SCC 80,934 1,944Average journal clustering coefficient (SCC) 0.285 0.514
Diameter of largest SCC 37 14
Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
logdata
matrix M
matrix M'
most reliableedges
97k journals6.7M edges
346Minteractions
MESUR
1Binteractions
map
rankings2307 journals50k edges
all edgescommon
date range
rank
N(vi,vj)
05000
12000
20000
28000
36000
0e+00 3e+04 1e+05 2e+05
Fruchterman (1991) Graph Drawing by Force-directed
Placement. Software, 21(11):1129-1164
Classification code:
root
DeweyJCR
Journals1 2 3
AATtaxonomy
4 5
Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
Appliedphysics
Organicchemistry
AnalyticalChemistry
BiochemistrySocial and personality
psychology
Polymers
Miocrobiology
Biotechnology
Music
PlantbiologyBiodiversity
Hydrology
Genetics
Brainresearch
Geology
Agriculture
Alternativeenergy
Toxicology
Plantagriculture
AnimalBehavior
EnvironmentalScience
ChemicalEngineering
Tourism
Nutrition
Neurology
Sportsmedicine Classical
studies
NursingPsychology
Geography
Social work
Dermatology
ClinicalPharmacology
Internationalstudies
Statistics
Archeology
Demographics
Economics
ChildPsychology
Humangeography
Productionresearch
Manufacturing Material scienceEngineering
Clinicaltrials
Education
Anthropology
Publichealth
Law
Sociology
Philosophy
Asian studies
Religion
MinerologyAcoustics
Statisticalphysics
Thermodynamics
Physical chemistry
Pharmaceuticalresearch
Electrochemistry
Ecology
Language
CognitiveScience
Brain studies
Soil/Marinebiology
Physiology
Plantgenetics
ArchitectureDesign
Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
The Arts and Architecture Thesaurus (Getty ResearchCenter)
Table: Distance from AAT root (α) and number of classificationsNc at that level. Each α produces a finer-grained separation ofscientific disciplines.
Distance (α) Nc Example classifications
1 4 Natural sciences, social sciences, humanities, · · ·2 8 Biology, chemistry, physics, · · ·3 31 Classics, communication, engineering, · · ·4 195 Allergy, anesthesiology, applied linguistics, · · ·
Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
root
DeweyJCR
Journals1 2 3
AATtaxonomy
4 5
Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
Cross-validating the clickstream map
f (vi , vj , α) =
{1 Cα(vi ) = Cα(vj)
0 Cα(vi ) 6= Cα(vj)
17
logdata P (vi, vj) ! M ! =
!"#
p0,0 · · · p0,n
.... . .
...pn,0 · · · pn,n
$%& ,
Ni,j > µ0.5(Nk)Ni,j ! µ0.5(Nk)""#
$$%F1
!2
AAT f(vi, vj ,") ! A! =
!"#
a0,0 · · · a0,n
.... . .
...an,0 · · · an,n
$%& ,
ai,j = 0ai,j = 1
F2& &
&
'
contingency table at "
Figure 6. Cross-validating the map structure given by M ! to journal relationships derivedfrom AAT journal classifications, i.e. matrix A!.
Table 2. Network parameters of original (M) and reduced (M !) clickstream matrices.Network matrix
Parameter M M !
Journals 97,532 2,307Edges 6,783,552 50,000
Matrix density 0.071% 0.939%Strongly Connected Components (SCC) 16,474 236
Journals in SCC 80,934 1,944Average journal clustering coe!cient (SCC) 0.285 0.514
Diameter of largest SCC 37 14
α = 1 : p < 0.0001α = 2 : p < 0.0001α = 3 : p < 0.0001α = 4 : p < 0.0001
Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
Betweenness centrality
Cb(vk) =∑
i 6=j 6=k
σi ,j(vk)
σi ,j(1)
Table: Ranking of journals from M ′ according to betweennesscentrality.
Rank Journal Top-level AAT classification
1 Science Natural Sciences2 Proceedings of the National Academy of Sciences Natural Sciences3 Environmental Health Perspectives Natural Science4 Chemosphere Natural Sciences5 Journal of Advanced Nursing Natural Sciences6 Nature Natural Sciences7 Ecology Natural Sciences8 Milbank Quarterly Natural Sciences9 Applied and Environmental Microbiology Natural Sciences
10 Child Development Social Sciences11 Behavioral Ecology and Sociobiology Social Sciences12 Journal of Colloid and Information Science Natural Sciences13 American Anthropologist Social Sciences14 Journal of Biogeography Natural Sciences15 Materials Science and Technology Natural Sciences
Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
PageRank
PR(vi ) =1− λ
N+ λ
∑
j
PR(vj)
O(vj)(2)
Table: Ranking of journals from M ′ according to PageRank(λ = 0.85).
Rank Journal Top-level AAT classification
1 Applied Physics Letters Natural Sciences2 Journal of Advanced Nursing Natural Sciences3 Journal of the American Chemical Society Natural Sciences4 Ecology Natural Sciences5 Nature Natural Sciences6 Physical Review B Natural Sciences7 Journal of Applied Physics Natural Sciences8 American Economic Review Social Sciences9 American Historical Review Social Sciences
10 Physical Review Letters Natural Sciences11 Science Natural Sciences12 Langmuir Natural Sciences13 Journal of Chemical Physics Natural Sciences14 American Anthropologist Social Sciences15 Annals of the American Academy of Political and Social Science Social Science
Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
Outline
1 IntroductionProblem statementUsage data
2 MESURMESUR overviewCreating MESUR’s reference data set
3 Mapping Science
4 Metrics Survey
5 DiscussionOverviewFuture ResearchRelevant papers
Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
Metrics
ID Type Measure Source Network parameters PC1 PC2 ρ̄
1 Citation Scimago Journal Rank Scimago/Scopus -0.974 -8.296 0.556?
2 Citation Immediacy Index JCR 2007 1.659 -7.046 0.508?
3 Citation Closeness Centrality JCR 2007 Undirected, weighted 0.339 -6.284 0.565?
4 Citaton Cites per doc Scimago/Scopus -1.311 -6.192 0.588?
5 Citation Journal Impact Factor JCR 2007 -1.854 -5.937 0.592?
6 Citation Closeness centrality JCR 2007 Undirected, unweighted -1.388 -4.827 0.6197 Citation Out-degree centrality JCR 2007 Directed, weighted -3.191 -4.215 0.6428 Citation Out-degree centrality JCR 2007 Directed, unweighted -2.703 -4.015 0.6409 Citation Degree Centrality JCR 2007 Undirected, weighted -4.850 -2.834 0.69010 Citation Degree Centrality JCR 2007 Undirected, unweighted -4.398 -2.643 0.69111 Citation H-Index Scimago/Scopus -3.326 -2.003 0.68112 Citation Scimago Total cites Scimago/Scopus -4.926 -1.722 0.71213 Citation Journal Cite Probability JCR 2007 -5.389 -1.647 0.71014 Citation In-degree centrality JCR 2007 Directed, unweighted -5.302 -1.429 0.71715 Citation In-degree centrality JCR 2007 Directed, weighted -5.380 -1.554 0.71216 Citation PageRank JCR 2007 Directed, unweighted -4.476 0.108 0.69317 Citation PageRank JCR 2007 Undirected, unweighted -4.929 0.731 0.72618 Citation PageRank JCR 2007 Undirected, weighted -4.160 0.864 0.69619 Citation PageRank JCR 2007 Directed, weighted -3.103 0.333 0.65920 Citation Y-factor JCR 2007 Directed, weighted -2.971 0.317 0.65721 Citation Betweenness centrality JCR 2007 Undirected, weighted -0.462 0.872 0.64322 Citation Betweenness centrality JCR 2007 Undirected, unweighted -0.474 1.609 0.64223 Citation Citation Half-Life JCR 2007 / / 0.03724 Usage Closeness centrality MESUR 2007 Undirected, weighted 3.130 2.683 0.703
25 Usage Closeness centrality MESUR 2007 Undirected, unweighted 3.100 3.899 0.73126 Usage Degree centrality MESUR 2007 Undirected, unweighted 3.271 3.873 0.72927 Usage PageRank MESUR 2007 Undirected, unweighted 3.327 4.192 0.72828 Usage PageRank MESUR 2007 Directed, unweighted 3.463 4.336 0.72729 Usage In-degree centrality MESUR 2007 Directed, unweighted 3.463 4.015 0.72830 Usage Out-degree centrality MESUR 2007 Directed, unweighted 3.484 3.994 0.72731 Usage PageRank MESUR 2007 Directed, weighted 3.780 4.217 0.71032 Usage PageRank MESUR 2007 Undirected, weighted 3.813 4.223 0.71033 Usage Betweenness centrality MESUR 2007 Undirected, unweighted 3.988 4.271 0.69934 Usage Betweenness centrality MESUR 2007 Undirected, weighted 3.957 3.698 0.69335 Usage Degree centrality MESUR 2007 Undirected, weighted 5.293 3.528 0.68336 Usage Out-degree centrality MESUR 2007 Directed, weighted 5.302 3.518 0.68337 Usage In-degree centrality MESUR 2007 Directed, weighted 5.286 3.531 0.68338 Usage Journal Use Probability MESUR 2007 8.914 1.833 0.59339 Usage Usage Impact Factor MESUR 2007 / / 0.279
Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
R10×10 =
0BBBBBBBBBBBBB@
1.00 0.71 0.77 0.52 0.79 0.55 0.69 0.63 0.60 0.180.71 0.99 0.52 0.69 0.79 0.85 0.49 0.44 0.49 0.220.77 0.52 1.00 0.62 0.63 0.39 0.70 0.73 0.68 0.200.52 0.69 0.62 1.00 0.68 0.78 0.49 0.56 0.65 0.060.79 0.79 0.63 0.68 1.00 0.82 0.66 0.62 0.66 0.150.55 0.85 0.39 0.78 0.82 1.00 0.40 0.40 0.50 0.130.69 0.49 0.70 0.49 0.66 0.40 1.00 0.89 0.85 0.530.63 0.44 0.73 0.56 0.62 0.40 0.89 1.00 0.97 0.450.60 0.49 0.68 0.65 0.66 0.50 0.85 0.97 1.00 0.420.18 0.22 0.20 0.06 0.15 0.13 0.53 0.45 0.42 1.00
1CCCCCCCCCCCCCA
19: Citation PageRank5: Journal Impact Factor22: Citation Betweenness6: Citation Closeness11: Citation H-index1: Citation Scimago Journal Rank31: Usage PageRank34: Usage Betweenness24: Usage Closeness39: Usage Impact Factor
Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
2007 JCR MESURcitation data usage log data
Scimago
citationnetwork
7,388 x 7,388
usagenetwork
7,575 x 7,575
2,5,233,6,7,8,9,10,13,1415,16,17,18,19,20
21,22
24,25,26,27,28,29,3931,32,33,34,35,36,37,
38,391,4,11,12
data sources
impact measures
citation data
39x39
correlationmatrix
intersection7,584
journals12,751journals
Schematic representation of data sources and processing. Impact measure identifiers.
Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
Principal Component Analysis
PC1 PC2 PC3 PC4 PC5
Proportion of Variance 66.1% 17.3% 9.2% 4.8% 0.9%Cumulative Proportion 66.1% 83.4% 92.6% 97.4% 98.3%
Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
!! " ! #"
!!
"!
$%#
$%&
22
2119
5
181716
151413 12
109
876
4 3
2
24
25 2627 2829
30313233
343536
37
38
1
11
20
Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
PC1
PC2
UsageBetweenness
PageRank
citation Betweenness
PageRank
Total Cites
Cites per Doc
JIF06
UsageProbability
CitationImmediacy
+-
+
-
Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey Discussion
2.5 2.0 1.5 1.0 0.5 0.0
38: Journal Use Probability31: Usage PageRank32: Usage PageRank36: Usage Out.degree37: Usage In.degree centr.35: Usage Degree centr.24: Usage Closen. centr.33: Usage Betw. centr.34: Usage Betw. centr.25: Usage Closen. centr.26: Usage Degree centr.28: Usage PageRank27: Usage PageRank29: Usage In.degree centr.30: Usage Out.degree centr.11: Scimago H.index12: Scimago Total Cites14: Citation In.degree centr.15: Citation In.degree centr.13: Cite Probability17: Citation PageRank10: Citation Degree centr.9: Citation Degree centr.22: Citation Between. centr.21: Citation Between. centr.19: Citation PageRank20: Citation Y.factor16: Citation PageRank18: Citation PageRank3: Citation Closen. centr.6: Citation Closen. centr.8: Citation Out.degree centr.7: Citation Out.degree centr.2: Citation Immediacy Index1: Scimago Journal Rank5: Journal Impact Factor4: Cites per doc
Cluster Measures Interpretation
1 38 Journal Use Probability2 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37 Usage measures3 1, 2, 3, 4, 5 JIF, SJR, Cites per Document measures4 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 Total Citation rates and distributions5 16, 17, 18, 19, 20, 21, 22 Citation Betweenness and PageRank
Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey DiscussionOverview Future Research Relevant papers
Outline
1 IntroductionProblem statementUsage data
2 MESURMESUR overviewCreating MESUR’s reference data set
3 Mapping Science
4 Metrics Survey
5 DiscussionOverviewFuture ResearchRelevant papers
Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey DiscussionOverview Future Research Relevant papers
MESUR: so far
Usage data: single largest reference data set of usage, citationand bibliographic data
+1,000,000,000 usage events loadedmultiple publishers, aggregators and institutionsInfrastructure for research program
Usage graphs: track scientific flow of activity
real-time studies of scienceinclusive of larger “scholarly community”
Metrics: valid, vetted indicators of scientific impact
Different facets of scholarly impactSimple metrics, good results. Law of diminishingreturns?Hybrid, consensus metrics?
Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey DiscussionOverview Future Research Relevant papers
MESUR: the bad
Sustainability: burden of data collection
ad hoc, customized agreements with dataprovidersrestrictive agreements with regards to datasharinghigh costs of maintaining infrastructure: funding
Research program: too much to do
3rd party access: better sciencemany eyes on data
Metrics and services: Community support and acceptance
Can’t be limited to academic exerciseInvestigations need to be usefulAccepted by community, become part ofscholarly assessment system
Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey DiscussionOverview Future Research Relevant papers
Future Research: Two directions
Longitudinal research and dynamic models of science
Bursty behavior of science
Timeseries of reads over time are burstyCoordinated: social networks at work?
Modeling
Stochastic and agent-based models of scientific activityCitation following? Group-think? Find parameters of humaninformation searching behavior.
Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey DiscussionOverview Future Research Relevant papers
Bursty behavior
0 50 100 150 200 250 300 350
100
300
500
700
APPLIED STATISTICS: time series
Index
j[, 2
]
0 50 100 150 200 250 300 350
−200
020
040
0
APPLIED STATISTICS: residuals percentiles
Index
j[, 2
] − m
$y
0 50 100 150 200 250 300 350
−200
020
040
0
APPLIED STATISTICS: residuals smoothed
Index
mr$
y
Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey DiscussionOverview Future Research Relevant papers
Contagion?
0 7 14 21 28 35 42 49 56 63 70 77 84 91 98 105 112 119 126 133 140 147 154 161 168 175 182 189 196 203 210 217 224 231 238 245 252 259 266 273 280 287 294 301 308 315 322 329 336
PERCEPTION & PSYCHOPHYSICSMOLECULAR PHYSICSADVANCES IN PHYSICSJOURNAL OF PHYSICS B ATOMIC MOLECULAR AND OPTICAL PHYSICJOURNAL OF COMPUTATIONAL PHYSICSJOURNAL OF CHEMICAL PHYSICSCENTRAL EUROPEAN JOURNAL OF PHYSICSCELL BIOCHEMISTRY AND BIOPHYSICSPHYSICS LETTERS BCONTEMPORARY PHYSICSNUCLEAR INSTRUMENTS & METHODS IN PHYSICS RESEARCH SECTCHEMICAL PHYSICSPHYSICA STATUS SOLIDI (B)NUCLEAR INSTRUMENTS & METHODS IN PHYSICS RESEARCH SECTINTERNATIONAL JOURNAL OF THERMOPHYSICSPHYSICA SCRIPTACANADIAN JOURNAL OF PHYSICSCHEMICAL PHYSICS LETTERSPHYSICA DAPPLIED PHYSICSCHINESE PHYSICS LETTERSRADIATION PHYSICS AND CHEMISTRYJOURNAL OF PHYSICS D APPLIED PHYSICSCURRENT APPLIED PHYSICSPHYSICA ANUCLEAR PHYSICS AMEDICAL PHYSICSNUCLEAR PHYSICS BPHYSICA CINTERNATIONAL JOURNAL OF THEORETICAL PHYSICSJOURNAL OF STATISTICAL PHYSICSFOUNDATIONS OF PHYSICSNEW JOURNAL OF PHYSICSPHYSICS OF PLASMASMATERIALS CHEMISTRY AND PHYSICSJOURNAL OF THE MECHANICS AND PHYSICS OF SOLIDSAPPLIED PHYSICS BCZECHOSLOVAK JOURNAL OF PHYSICSASTRONOMY & ASTROPHYSICSJOURNAL OF PHYSICS A GENERAL PHYSICSPHYSICS OF FLUIDSTECTONOPHYSICSPHYSICS LETTERS AJOURNAL OF PHYSICS G NUCLEAR AND PARTICLE PHYSICSBIOCHIMICA ET BIOPHYSICA ACTA (BBA) − BIOENERGETICSJAPANESE JOURNAL OF APPLIED PHYSICSPHYSICS OF ATOMIC NUCLEIASTRONOMY & GEOPHYSICSQUARTERLY REVIEWS OF BIOPHYSICSINTERNATIONAL JOURNAL OF MODERN PHYSICS APHYSICS OF THE EARTH AND PLANETARY INTERIORSPHYSICA BPHYSICA EMODERN PHYSICS LETTERS AATMOSPHERIC CHEMISTRY AND PHYSICSPHYSICS AND CHEMISTRY OF LIQUIDSSOLAR PHYSICSNATURE PHYSICSEUROPEAN JOURNAL OF PHYSICSFOUNDATIONS OF PHYSICS LETTERSPHYSICS TODAYBIOCHIMICA ET BIOPHYSICA ACTA (BBA) − BIOMEMBRANESASTROPHYSICS AND SPACE SCIENCEPHYSICS AND CHEMISTRY OF THE EARTH PARTS A/B/CACTA CRYSTALLOGRAPHICA SECTION A CRYSTAL PHYSICS DIFFRAANNALS OF PHYSICSBIOCHIMICA ET BIOPHYSICA ACTA (BBA) − MOLECULAR BASIS OF DEUROPHYSICS LETTERSAMERICAN JOURNAL OF PHYSICSJOURNAL OF HIGH ENERGY PHYSICS
Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey DiscussionOverview Future Research Relevant papers
Contagion?
! " #$ %# %& '( $% $) (* *' "! "" &$ )# )& #!( ##% ##) #%* #'' #$! #$" #($ #*# #*& #"( #&% #&) #)* %!' %#! %#" %%$ %'# %'& %$( %(% %() %** %"' %&! %&" %)$ '!# '!& '#( '%% '%) ''*
+,-./0-1234,2.5063/37849
7:8/05393.54+0./;0+,-./063/37849
./8-.2063/37849
428/84.2063/37849
.//.2901<0+,-./063/37849
=1,5/.201<0.998973;053>51;,4781/0./;063/37849
=1,5/.201<0./8-.20?533;8/60./;063/37849
-1234,2.50>+@2163/378490./;03A12,781/
<,/6.2063/378490./;0?81216@
-1234,2.5063/378490./;0-37.?1289-
?3+.A815063/37849
41/935A.781/063/37849
/.7,53063/37849
63/3784.
982A.3063/3784.
/3:063/378490B0914837@
3,51>3./0=1,5/.201<08--,/163/37849
.-3584./0=1,5/.201<0+,-./063/37849
?814+3-84.2063/37849
=1,5/.201<0-3;84.2063/37849
4./435063/378490./;04@7163/37849
>219063/37849
.//,.2053A83:01<063/37849
753/;908/063/37849
8--,/163/37849
.-3584./0=1,5/.201<0-3;84.2063/378490>.570.
5,998./0=1,5/.201<063/37849
+,-./063/37849
.//,.2053A83:01<063/1-8490./;0+,-./063/37849
=1,5/.201<0+,-./063/37849
63/390B063/378409@973-9
=1,5/.201<063/37840>9@4+1216@
63/3784.205393.54+
=1,5/.201<0/3,5163/37849
4@7163/37840./;063/1-305393.54+
63/378409148.20./;063/35.20>9@4+1216@0-1/165.>+9
/3,5163/37849
>9@4+8.7584063/37849
63/3784073978/6
>+.5-.4163/37849
63/378490./;0-1234,2.50?81216@
?-4063/37849
4,553/701>8/81/08/063/378490B0;3A321>-3/7
3,51>3./0=1,5/.201<0-3;84.2063/37849
-1234,2.50./;063/35.2063/37849
3,51>3./0=1,5/.201<0+,-./063/37849
.-3584./0=1,5/.201<0-3;84.2063/378490>.570?0/3,51>9@4+
63/378403/68/3358/60/3:9
63/378403>8;3-81216@
.//.2390;3063/378C,3
-,7.781/05393.54+0!063/3784071D841216@0./;03/A851/-3/7
.-3584./0=1,5/.201<0-3;84.2063/378490>.5704093-8/.5908/0-
63/37849093234781/03A12,781/
4,553/7063/37849
=1,5/.201<063/37849
.;A./43908/063/37849
63/3784908/0-3;848/3
E153./0=1,5/.201<063/37849
8/735/.781/.20=1,5/.201<08--,/163/37849
41--,/87@063/37849
>+.5-.4163/378490./;063/1-849
63/3784041,/9328/6
/.7,53053A83:9063/37849
63/37849
Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data
Introduction MESUR Mapping Science Metrics Survey DiscussionOverview Future Research Relevant papers
Relevant papers
Johan Bollen, Herbert Van de Sompel, Aric Hagberg and RyanChuteA Principal Component Analysis of 39 Scientific ImpactMeasures.PLoS ONE, June 2009. URL:http://dx.plos.org/10.1371/journal.pone.0006022.
Michael Kurtz and Johan Bollen.Usage bibliometrics.Annual Review of Information Science and Technology, 2010
Johan Bollen, Herbert Van de Sompel, Aric Hagberg,LuisBettencourt, Ryan Chute, Marko A. Rodriguez, LyudmilaBalakireva.Clickstream data yields high-resolution maps of science.PLoS One, February 2009.
Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data