73
Introduction MESUR Mapping Science Metrics Survey Discussion Tracking science in real-time from large-scale usage data Johan Bollen - [email protected] Indiana University School of Informatics and Computing Center for Complex Networks and Systems Research Cognitive Science Program July 29, 2010 Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data

Tracking science in real-time from large-scale usage datasalsahpc.indiana.edu › tutorials › slides › 0729 › ... · Tracking science in real-time from large-scale usage data

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Tracking science in real-time from large-scale usage datasalsahpc.indiana.edu › tutorials › slides › 0729 › ... · Tracking science in real-time from large-scale usage data

Introduction MESUR Mapping Science Metrics Survey Discussion

Tracking science in real-time from large-scaleusage data

Johan Bollen - [email protected]

Indiana UniversitySchool of Informatics and Computing

Center for Complex Networks and Systems ResearchCognitive Science Program

July 29, 2010

Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data

Page 2: Tracking science in real-time from large-scale usage datasalsahpc.indiana.edu › tutorials › slides › 0729 › ... · Tracking science in real-time from large-scale usage data

Introduction MESUR Mapping Science Metrics Survey DiscussionProblem statement Usage data

Outline

1 IntroductionProblem statementUsage data

2 MESURMESUR overviewCreating MESUR’s reference data set

3 Mapping Science

4 Metrics Survey

5 DiscussionOverviewFuture ResearchRelevant papers

Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data

Page 3: Tracking science in real-time from large-scale usage datasalsahpc.indiana.edu › tutorials › slides › 0729 › ... · Tracking science in real-time from large-scale usage data

Introduction MESUR Mapping Science Metrics Survey DiscussionProblem statement Usage data

Why study science?

Tremendous importance

Enormous amount of resources and people involved:

Allocation of resources: funding agencies, policy makers, thepublic, corporations, the scientists themselves

Social systems research: Ability to learn about general propertiesof similar social systems

Unrelated picture of my daughter

Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data

Page 4: Tracking science in real-time from large-scale usage datasalsahpc.indiana.edu › tutorials › slides › 0729 › ... · Tracking science in real-time from large-scale usage data

Introduction MESUR Mapping Science Metrics Survey DiscussionProblem statement Usage data

Science as a social system

Actors: scientists, stakeholders

Relations (often focused onexchange of knowledge)

Informal interactions:meetings, workshops,conversations, messages, etcFormal: affiliations,collaborations, projects,publications

Artifacts: articles, journals,reports, data, software

Xiaoming Liu, Johan Bollen, Michael L. Nelson, andHerbert Van de Sompel. Co-authorship networks inthe digital library research community. InformationProcessing and Management, 41(6):14621480,December 2005

Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data

Page 5: Tracking science in real-time from large-scale usage datasalsahpc.indiana.edu › tutorials › slides › 0729 › ... · Tracking science in real-time from large-scale usage data

Introduction MESUR Mapping Science Metrics Survey DiscussionProblem statement Usage data

The measure of science in the print-eraA word on the traditional way

Gold standard: citation data

Extracted from “publication” data.

From print comes citation:

Science is a gift-economySteal text, but not ideas.Currency is acknowledgementof influence: citationsMore citations, more influenceTrack scientific activity fromcitation data

Citations are derived frompeer-reviewed publications

Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data

Page 6: Tracking science in real-time from large-scale usage datasalsahpc.indiana.edu › tutorials › slides › 0729 › ... · Tracking science in real-time from large-scale usage data

Introduction MESUR Mapping Science Metrics Survey DiscussionProblem statement Usage data

The measure of science in the print-eraA word on the traditional way

Gold standard: citation data

Extracted from “publication” data.

From print comes citation:

Science is a gift-economySteal text, but not ideas.Currency is acknowledgementof influence: citationsMore citations, more influenceTrack scientific activity fromcitation data

Citations are derived frompeer-reviewed publications

Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data

Page 7: Tracking science in real-time from large-scale usage datasalsahpc.indiana.edu › tutorials › slides › 0729 › ... · Tracking science in real-time from large-scale usage data

Introduction MESUR Mapping Science Metrics Survey DiscussionProblem statement Usage data

The measure of science in the print-eraA word on the traditional way

Gold standard: citation data

Extracted from “publication” data.

From print comes citation:

Science is a gift-economySteal text, but not ideas.Currency is acknowledgementof influence: citationsMore citations, more influenceTrack scientific activity fromcitation data

Citations are derived frompeer-reviewed publications

Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data

Page 8: Tracking science in real-time from large-scale usage datasalsahpc.indiana.edu › tutorials › slides › 0729 › ... · Tracking science in real-time from large-scale usage data

Introduction MESUR Mapping Science Metrics Survey DiscussionProblem statement Usage data

From this springs an entire industry

Applications of citation data

End-user services and bibliometrics

Web of Science, Scopus, publishers, institutions, etc.

Citation-based services

Reference linkingVariety of end-user services

Citation-based assessment and bibliometrics

Impact metricsAnalyticsMapping

Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data

Page 9: Tracking science in real-time from large-scale usage datasalsahpc.indiana.edu › tutorials › slides › 0729 › ... · Tracking science in real-time from large-scale usage data

Introduction MESUR Mapping Science Metrics Survey DiscussionProblem statement Usage data

From this springs an entire industry

Applications of citation data

End-user services and bibliometrics

Web of Science, Scopus, publishers, institutions, etc.

Citation-based services

Reference linkingVariety of end-user services

Citation-based assessment and bibliometrics

Impact metricsAnalyticsMapping

Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data

Page 10: Tracking science in real-time from large-scale usage datasalsahpc.indiana.edu › tutorials › slides › 0729 › ... · Tracking science in real-time from large-scale usage data

Introduction MESUR Mapping Science Metrics Survey DiscussionProblem statement Usage data

From this springs an entire industry

Applications of citation data

End-user services and bibliometrics

Web of Science, Scopus, publishers, institutions, etc.

Citation-based services

Reference linkingVariety of end-user services

Citation-based assessment and bibliometrics

Impact metricsAnalyticsMapping

Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data

Page 11: Tracking science in real-time from large-scale usage datasalsahpc.indiana.edu › tutorials › slides › 0729 › ... · Tracking science in real-time from large-scale usage data

Introduction MESUR Mapping Science Metrics Survey DiscussionProblem statement Usage data

Two problems with present citation-based approachData (1) and metrics (2)

Issues:

1 Inherent features of citation data due to its origins inpublished material

2 Existing metrics fail to acknowledge (1), and ignore networkproperties of science as a social system.

Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data

Page 12: Tracking science in real-time from large-scale usage datasalsahpc.indiana.edu › tutorials › slides › 0729 › ... · Tracking science in real-time from large-scale usage data

Introduction MESUR Mapping Science Metrics Survey DiscussionProblem statement Usage data

Citation dataDelays and partiality

Citation problem 1: Data

Citation data is late and partial indicator of scientific activity

discovery research publication discovery research publication

citation

time 0 year +0.5 year +1.5 years +3.5 years+2 years +2.5 years

citation data

?

Publication delays

Publicationt → publicationt−1

→ citation DB

Domain-dependent practices

Community: Publishing authorsonly

Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data

Page 13: Tracking science in real-time from large-scale usage datasalsahpc.indiana.edu › tutorials › slides › 0729 › ... · Tracking science in real-time from large-scale usage data

Introduction MESUR Mapping Science Metrics Survey DiscussionProblem statement Usage data

Citation dataMetrics vs. networks

Citation problem 2: networks

Ignoring network properties of science

Livestock

Food

Chemistry

Material Sciences

Physics

Computer Science

Geology

Sports Medicine

ObGyn

Medicine

Resource Management

Dermatology

!"#$%&'(%)

*$+&,-%*$+&,!"#$.,-%-,+%&%*$+&,-%$,/

0%!"#$%/"'.%*

0%,&1%/"'.

0%!"#$%/"'.%)

!"#$%&'(%2

*$+&,!"#$%0

0%)3,4%/"'.

)3,/"'.3$+&#

!"#$%&'(%'

0%"31"%'-'&1#%!"#$

3-,&1%/"'.

!"#$%&'(%4'++

,&1*-,.'+*443/$

!"#$%&'(%*

+'+&*"'2&,-

0%/"'.%!"#$

0%*!!4%!"#$

*$+&,-%0

*-1'5%/"'.%3-+%'23+

+'+&*"'2&,-%4'++

!,4#.'&

,&1%4'++

0%!"#$!/,-2'-$%.*+

/"'.%&'(

0%(3&,4

/"'.%!"#$%4'++

!"#$%&'(%/

)3,/"'.%)3,!"%&'$%/,

0%*.%/"'.%$,//"'.!'6&%0

0%3..6-,4

!%$,/%!",+,!,!+%3-$

*--%1',!"#$!*+.%"#2&

1',!"#$%&'$%4'++

*$+&%$,/%!

!"#$%74632$

!%-*+4%*/*2%$/3%6$*

*!!4%!"#$%4'++

-6/4%!"#$%)

!"#$%/"'.%/"'.%!"#$

0%1',!"#$%&'$

.,4%!"#$

!"#$%4'++%)

'6&%0%,&1%/"'.

.,4%.3/&,)3,4

/"'.%/,..6-

*3!%/,-7%!

$6&7%$/3

0%,&1*-,.'+%/"'.

4*-1.63&

4'/+%-,+'$%/,.!6+%$/

/"'.%!"#$

*!!4%/*+*4%*!1'-

3-+%0%86*-+6.%/"'.

+"',/"'.!0%.,4%$+&6/

)3,/"'.%0

*/+*%/&#$+*44,1&%'

-'6&,$/3'-/'

$#-4'++

,&1%)3,.,4%/"'.

0%!"#$%)!*+%.,4%,!+

(3&,4,1#

3-,&1%/"3.%*/+*

2*4+,-%+

0%-'6&,!"#$3,4

!"#$%!4*$.*$

!"#$3/*%)

)3,!"#$%0

0%.,4%)3,4

*!!4%'-(3&,-%.3/&,)

$#-+"'$3$!$+6++1*&+

'*&+"%*-2%!4*-'+*&#%$/3'-/'%4'++'&$

0%/,.!%-'6&,4

0%!,4#.%$/3%!,4%/"'.

.*+%$/3%'-1%*!$+&6/+

1',!"#$%0%3-+

.,4%/'44%)3,4

0%-'6&,$/3

-6/4'3/%*/32$%&'$

3-+%0%.,2%!"#$%*

0%/"&,.*+,1&%*

.*/&,.,4'/64'$

2'(%)3,4

'6&%0%3-,&1%/"'.

7')$%0

/,,&23-%/"'.%&'(

0%)*/+'&3,4

/43-%,&+",!%&'4*+%&

+"'&3,1'-,4,1#

*-*4%/"3.%*/+*

*--%+",&*/%$6&1

/43-%/*-/'&%&'$

!"#$%4'++%*

*.%0%!"#$3,4!"'*&+%/

*/+*%.*+'&0*!*-'$'%0,6&-*4%,7%*!!43'2%!"#$3/$

'4'/+&,/"3.%*/+*

'6&%!"#$%0%)

*&/"%)3,/"'.%)3,!"#$

7')$%4'++

3-+%0%/*&23,4

.3/&,)3,4!69

/"'$+

*.%0%/*&23,4

/'44%.,4%437'%$/3

3-2%'-1%/"'.%&'$

/43-%-'6&,!"#$3,4

$,432%$+*+'%/,..6-

/4*$$3/*4%*-2%86*-+6.%1&*(3+#

/"'.%1',4

'6&%0%-'6&,$/3

*$+&,!"#$%0%$6!!4%$

+'+&*"'2&,-!*$#..'+&.,4%)3,4%/'44

/*-%0%/"'.

0%*.%/,44%/*&23,4

0%),-'%0,3-+%$6&1%*.

0%/,44,32%3-+'&7%$/3

-6/4%!"#$%*

0%*!!4%!,64+&#%&'$

!&,+'3-$

(*2,$'%:,-'%0

0%!"#$%*!.*+"%1'-

0%1'-%(3&,4

.,-%5'*+"'&%&'(

0%/43-%,-/,4

0%*&+"&,!4*$+#

"6.%&'!&,2

-6/4%2*+*%$"''+$

2'('4,!.'-+

/*+*4%+,2*#

0%'6&%/'&*.%$,/

,-/,1'-'

-6/4%!"#$%)!!&,/%$6!

3-+%0%.,2%!"#$%)!"#$%$+*+6$%$,4323%)

.*&%'/,4!!&,1%$'&

/*-/'&%&'$

+"3-%$,432%734.$

3-7'/+%3..6-

)&*3-%&'$

$!'/+&,/"3.%*/+*%*

!,4#"'2&,-

0%),-'%0,3-+%$6&1%)&

)4,,2

)%/"'.%$,/%0!-

0%/'44%$/3

*$+&,!"#$%$!*/'%$/3

!"#$%&'!

-'6&,$6&1'&#

+'/+,-,!"#$3/$

'6&%0%*!!4%!"#$3,4

0%*!!4%!,4#.%$/3

3/*&6$

'6&%!"#$%0%/

+*4*-+*

0%.,4%/*+*4%*!/"'.

/"'.,$!"'&'

0%!",+,/"%!",+,)3,%*

$!*/'%$/3%&'(

.,4%'/,4

/"'.%.*+'&

2'(%2#-*.

0%"#2&,4

'6&%!"#$%0%*

*.%0%&,'-+1'-,4

';!%)&*3-%&'$

)%$'3$.,4%$,/%*.

,'/,4,13*

0%!,4#.%$/3%!,4%!"#$

0%/&#$+%1&,5+"

$!3-'

1',/"3.%/,$.,/"3.%*/

0%-'6&,/"'.

0%*+.,$%$,4!+'&&%!"#

,39,$

/,44,32%$6&7*/'%*

!"#$3/*%/

&'(%)&*$%:,,+'/-

/"'.%'-1%$/3

0%/43.*+'

)3,,&1*-%.'2%/"'.

'6&,!"#$%4'++

*-*4%)3,*-*4%/"'.

/3&/64*+3,-

0%/,.!6+%/"'.

1',/"'.%1',!"#%1',$#

0%.,4%$+&6/+

0%-*+%!&,26/+$

!6)4%*$+&,-%$,/%!*/

'-2,/&3-,4,1#

.'+*44%.*+'&%+&*-$%*

9'#%'-1%.*+'&

0%.'2%/"'.

0%';!%),+

0%*!!4%!"#$3,40%*1&%7,,2%/"'.

*.%0%&'$!%/&3+%/*&'

3-('$+%,!"+"%(3$%$/3

,!"+"*4.,4,1#

1'-'+3/$

$6&1%-'6&,4

7,&'$+%'/,4%.*-*1

.'*+%$/3

0%-'6&,$6&1.*+'&%$/3%7,&6.

0%*44,#%/,.!2

0%.*1-%.*1-%.*+'&

*.%0%,)$+'+%1#-'/,4

"'4(%/"3.%*/+*

*+.,$%'-(3&,-

'4'/+&,*-*4

*.%"'*&+%0

$+62%$6&7%$/3%/*+*4

,!+%';!&'$$

$/3'-/'

7'.$%.3/&,)3,4%4'++

$#-+"'+3/%/,..6-

74632%!"*$'%'86343)&

-6/4%3-$+&6.%.'+"%)

'6&%&'$!3&%0

8%0%&,#%.'+',&%$,/

!"#$3/*%*

'6&%"'*&+%0

0%$!''/"%4*-1%"'*&%&

:%*-,&1%*441%/"'.

-*+6&'

/43-%3-7'/+%23$

*-+3/*-/'&%&'$

0%!"#$%2%*!!4%!"#$

0%.*+'&%$/3

)'"*(%)&*3-%&'$

0%*.%*/*2%2'&.*+,4

,)$+'+%1#-'/,4

!$#/",!"*&.*/,4,1#

*.%0%,!"+"*4.,4

.,2%!"#$%4'++%*

*/+*%",&+3/

0%$+&6/+%1',4

0%!'+&,4

*--6%&'(%*$+&,-%*$+&

0%.,4%$!'/+&,$/

"#2&,4%!&,/'$$

.*/&,.,4%/"'.%!"#$

0%!"#$%1%-6/4%!*&+3/

*!!4%$6&7%$/3

7,,2%/"'.

&'(%.,2%!"#$

-6/4%3-$+&6.%.'+"%*

!"*&.*/,4%)3,/"'.%)'

5*+'&%'-(3&,-%&'$

0%*+.,$%$/3

*&+"&,$/,!#

'6&%0%3..6-,4

.'+",2%'-:#.,4

$,4%!"#$

7'&+34%$+'&34

)3,,&1%.'2%/"'.%4'++

.'2%$/3%$!,&+%';'&

0%(,4/*-,4%1',+"%&'$

!4*-'+%$!*/'%$/3

!&,+'3-%$/3

'6&%0%!"*&.*/,4

'-(3&,-%$/3%+'/"-,4

*.%0%$!,&+%.'2

",&+$/3'-/'

'/,4,1#

0%/,$.,4%*$+&,!*&+%!

0%!"#$3,4!4,-2,-

!"#$%*+,.%-6/4<

0%'4'/+&,/"'.%$,/

*-*4%/"'.

)&3+%0%,!"+"*4.,4

"#2&,)3,4,13*

*-*4%)3,/"'.

)3,3-7,&.*+3/$

,!+%'-1

3-+%0%&'.,+'%$'-$

1#-'/,4%,-/,4

,!+%/,..6-

*$3*-%*6$+&*4%0%*-3.

0%'4'/+&,*-*4%/"'.

'(,46+3,-

0%!"#$%/"'.%$,432$

0%+",&*/%/*&23,(%$6&

/&,!%$/3

0%';!%.*&%)3,4%'/,4

3-+%0%&*23*+%,-/,4

43+",$

!4*-+*

0%!"*&.*/,4%';!%+"'&

-'6&,$/3%4'++

932-'#%3-+

!'23*+&3/$

)3,4%&'!&,2

0%/43-%'-2,/&%.'+*)

3..6-,4%&'(

/,/"&*-'%2)%$#$+%&'(

0%$,6-2%(3)

';!%/'44%&'$

3-+%0%"'*+%.*$$%+&*-

!4*-+%!"#$3,4

.*&%)3,4

3-06&#

0%!"#$%,/'*-,1&

0%74632%.'/"

0%/43-%.3/&,)3,4

0%-,-!/&#$+%$,432$

0%!'23*+&%,&+",!'2

+"&,.)%"*'.,$+*$3$

(*//3-'

0%!'23*+&%,&+",!%)

0%/'44%)3,41'-'%2'(

0%,!+%$,/%*.%*

23*)'+'$%/*&'

23*)'+'$

0*.*!0%*.%.'2%*$$,/

+%*.%73$"%$,/

1'-'

)&3+%0%2'&.*+,4

!4*-+%0

&*23,4,1#

.3/&,!,&%.'$,!,&%.*+

*-3.%&'!&,2%$/3

0%/*+*4

0%*/,6$+%$,/%*.

*&/"%(3&,4

5*+'&%&'$

-'6&,-

/*-/'&

0%$/3%7,,2%*1&

0%*.%/'&*.%$,/

*3/"'%0

.,4%)3,4%'(,4

*!!4%,!+3/$

(3$3,-%&'$

*&+'&3,$/4%+"&,.%(*$

/,-+&3)%.3-'&*4%!'+&

'.),%0*-+3.3/&,)%*1'-+$%/"

+"',&%/,.!6+%$/3

*2(%.*+'&

!4*$.*%!"#$%/,-+&%7

0%7,,2%$/3

*-3.%)'"*(

&'.,+'%$'-$%'-(3&,-

0%2*3&#%$/3

!4*-+%*-2%$,34

4*-/'+

0%';!%+"',&%!"#$<

!'&/'!+%!$#/",!"#$

*&/"%,!"+"*4.,4!/"3/

*.%-*+

0%.*+"%!"#$

0%';!%.'2

1',4,1#

*/+*%/&#$+*44,1&%2

0%431"+5*('%+'/"-,4

!4*-+%/'44

5*+'&%&'$,6&%&'$

/*-%0%73$"%*86*+%$/3

0%/43-%3-('$+

,!+%4'++

/*-%0%7,&'$+%&'$

*.%.3-'&*4

!,64+&#%$/3

/'44

$,34%$/3%$,/%*.%0

*&/"%2'&.*+,4

!"#$3/*4%&'(3'5

0%!"#$%/"'.!6$

/,..6-%*/.

$"'4;$=>%$"'4;4=>

6-!6)

0%*$$,/%/,.!6+%.*/"

*/+*%.'+*44%.*+'&

0%!"#$%/%$,432%$+*+'

)3,/"3.%)3,!"#$%*/+*

0%&'!&,2%7'&+34

/3&/%&'$

*/+*%/&#$+*44,1&%*

*.%&'(%&'$!3&%23$

'4'/+&,'-%/43-%-'6&,

0%*!!4%/&#$+*44,1&

3'''%0%86*-+6.%'4'/+

'6&%0%)3,/"'.

$3*.%0%/,.!6+

$*2*)$

3'''%+%3-7,&.%+"',&#

.6+*+%&'$

0%*-3.%$/3

-'5%'-14%0%.'2

!"#+,/"'.3$+&#

'4'/+&,-%4'++

0%/"'.%$,/%!'&9%+%?

:%!"#$%/%!*&+%73'42$

0%3-7'/+%23$

/,..6-3/*+3,-

1*$+&,'-+'&,4,1#

0%*.%$,/%",&+3/%$/3

-6/4%76$3,-

43.-,4%,/'*-,1&

*--%!"#$!-'5%#,&9

/"'.%)'&

*.%0%/43-%-6+&

/,..6-%.*+"%!"#$

0%-*+4%/*-/'&%3

XXz

XXXz

bad

good

Measuring impact, influence,prestige from citation data:

More citations is betterthan less citations

In citation network,central position is betterthan outskirts.

Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data

Page 14: Tracking science in real-time from large-scale usage datasalsahpc.indiana.edu › tutorials › slides › 0729 › ... · Tracking science in real-time from large-scale usage data

Introduction MESUR Mapping Science Metrics Survey DiscussionProblem statement Usage data

Citation problem no. 2Metrics vs. networks

This approach is epitomized in Thomson-Reuters Impact Factor:

Standard citation graph representation

G = (V ,E ,W )E ⊆ V 2

W : E → N+

However:∀(ni , nj) ∈ E : pubtime(ni ) > pubtime(ni )

Impact Factor: normalized in-degree

IFj =P

i wij

Nj

Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data

Page 15: Tracking science in real-time from large-scale usage datasalsahpc.indiana.edu › tutorials › slides › 0729 › ... · Tracking science in real-time from large-scale usage data

Introduction MESUR Mapping Science Metrics Survey DiscussionProblem statement Usage data

Citation problem no. 2Metrics vs. networks

This approach is epitomized in Thomson-Reuters Impact Factor:

Standard citation graph representation

G = (V ,E ,W )E ⊆ V 2

W : E → N+

However:∀(ni , nj) ∈ E : pubtime(ni ) > pubtime(ni )

Impact Factor: normalized in-degree

IFj =P

i wij

Nj

Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data

Page 16: Tracking science in real-time from large-scale usage datasalsahpc.indiana.edu › tutorials › slides › 0729 › ... · Tracking science in real-time from large-scale usage data

Introduction MESUR Mapping Science Metrics Survey DiscussionProblem statement Usage data

Citation problem no. 2Metrics vs. networks

This approach is epitomized in Thomson-Reuters Impact Factor:

Standard citation graph representation

G = (V ,E ,W )E ⊆ V 2

W : E → N+

However:∀(ni , nj) ∈ E : pubtime(ni ) > pubtime(ni )

Impact Factor: normalized in-degree

IFj =P

i wij

Nj

Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data

Page 17: Tracking science in real-time from large-scale usage datasalsahpc.indiana.edu › tutorials › slides › 0729 › ... · Tracking science in real-time from large-scale usage data

Introduction MESUR Mapping Science Metrics Survey DiscussionProblem statement Usage data

Citation problem no. 2Between Big Star and Britney Spears

Is more always better?

Why context matters

Sold 50M records

Influenced who?

High popularity, lowprestige.

Sold 50k records

Influenced REM, B52s,Teenage Fanclub, etc.

Low popularity, highinfluence

Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data

Page 18: Tracking science in real-time from large-scale usage datasalsahpc.indiana.edu › tutorials › slides › 0729 › ... · Tracking science in real-time from large-scale usage data

Introduction MESUR Mapping Science Metrics Survey DiscussionProblem statement Usage data

Citation problem no. 2Between Big Star and Britney Spears

Is more always better?

Why context matters

Sold 50M records

Influenced who?

High popularity, lowprestige.

Sold 50k records

Influenced REM, B52s,Teenage Fanclub, etc.

Low popularity, highinfluence

Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data

Page 19: Tracking science in real-time from large-scale usage datasalsahpc.indiana.edu › tutorials › slides › 0729 › ... · Tracking science in real-time from large-scale usage data

Introduction MESUR Mapping Science Metrics Survey DiscussionProblem statement Usage data

Citation problem no. 2Between Big Star and Britney Spears

Is more always better?

Why context matters

Sold 50M records

Influenced who?

High popularity, lowprestige.

Sold 50k records

Influenced REM, B52s,Teenage Fanclub, etc.

Low popularity, highinfluence

Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data

Page 20: Tracking science in real-time from large-scale usage datasalsahpc.indiana.edu › tutorials › slides › 0729 › ... · Tracking science in real-time from large-scale usage data

Introduction MESUR Mapping Science Metrics Survey DiscussionProblem statement Usage data

Citation problem no. 2Between Big Star and Britney Spears

Is more always better?

Why context matters

Sold 50M records

Influenced who?

High popularity, lowprestige.

Sold 50k records

Influenced REM, B52s,Teenage Fanclub, etc.

Low popularity, highinfluence

Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data

Page 21: Tracking science in real-time from large-scale usage datasalsahpc.indiana.edu › tutorials › slides › 0729 › ... · Tracking science in real-time from large-scale usage data

Introduction MESUR Mapping Science Metrics Survey DiscussionProblem statement Usage data

Citation problem no. 2Between Big Star and Britney Spears

Silly? That’s how we evaluate scientific impact right now!

The Impact Factor and lots of other citation-based metrics arepresently being used to assess the quality of publications, journals,authors, institutions, and even entire countries by proxy.

Note: Lots of developments on better metrics (cf. Eigenfactor:Bergstrom and Rosvall), but still reliance on citation data.

Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data

Page 22: Tracking science in real-time from large-scale usage datasalsahpc.indiana.edu › tutorials › slides › 0729 › ... · Tracking science in real-time from large-scale usage data

Introduction MESUR Mapping Science Metrics Survey DiscussionProblem statement Usage data

New developmentsPageRank for citation graphs

v2

v3

15

v1 505

4

1

2

3

60

20

4010

20

?

?

20

Impact Factor: IFj =P

i wij

Nj:

normalized citation countorigin of citation disregarded

A different route:

IF (vi , ) ' λ∑

j IFj

IF (vi , ) ' λ∑

j IFj × 1O(vj )

PR(vi ) ' λ∑

j PR(vj)× 1O(vj )

PR(vi ) ' (1−λ)N + λ

∑j PR(vj)× 1

O(vj )

PRw (vi ) = (1−λ)N + λ

∑j PRw (vj)×w(vj , vi )

Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data

Page 23: Tracking science in real-time from large-scale usage datasalsahpc.indiana.edu › tutorials › slides › 0729 › ... · Tracking science in real-time from large-scale usage data

Introduction MESUR Mapping Science Metrics Survey DiscussionProblem statement Usage data

New developments, contdPageRank for citation graphs

ISI IF PRw

rank value Journal value (x 103) Journal

1 52.28 ANNU REV IMMUNOL 16.78 NATURE

2 37.65 ANNU REV BIOCHEM 16.39 J BIOL CHEM

3 36.83 PHYSIOL REV 16.38 SCIENCE

4 35.04 NAT REV MOL CELL BIO 14.49 P NATL ACAD SCI USA

5 34.83 NEW ENGL J MED 8.41 PHYS REV LETT

6 30.98 NATURE 5.76 CELL

7 30.55 NAT MED 5.70 NEW ENGL J MED

8 29.78 SCIENCE 4.67 J AM CHEM SOC

9 28.18 NAT IMMUNOL 4.46 J IMMUNOL

10 28.17 REV MOD PHYS 4.28 APPL PHYS LETT

Table: The highest ranking journals according to ISI IF and WeightedPageRank (JCR2003)

Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data

Page 24: Tracking science in real-time from large-scale usage datasalsahpc.indiana.edu › tutorials › slides › 0729 › ... · Tracking science in real-time from large-scale usage data

Introduction MESUR Mapping Science Metrics Survey DiscussionProblem statement Usage data

Two problems with present citation-based approachData (1) and metrics (2)

Issues:

1 Inherent features of citation data due to its origins inpublished material

2 Existing metrics fail to acknowledge (1), and ignore networkproperties of science as a social system.

Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data

Page 25: Tracking science in real-time from large-scale usage datasalsahpc.indiana.edu › tutorials › slides › 0729 › ... · Tracking science in real-time from large-scale usage data

Introduction MESUR Mapping Science Metrics Survey DiscussionProblem statement Usage data

Post-print, online era, aka known as “today”: usage data?

Most use of scholarly resources is now mediated by online services

discovery research publication discovery research publication

citation

time 0 year +0.5 year +1.5 years +3.5 years+2 years +2.5 years

usage datasearch queries

text data citation data

early indicators late indicators

Recorded by nearly everyscholarly service

Captures early phases ofscientific activity

Includes a wide variety ofresource types

Activities of larger scientificcommunities, e.g. not limitedonly to authors

Scale: cf. Elsevier announced1B downloads in 2006 vs. 650Mcitation in WoS.

Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data

Page 26: Tracking science in real-time from large-scale usage datasalsahpc.indiana.edu › tutorials › slides › 0729 › ... · Tracking science in real-time from large-scale usage data

Introduction MESUR Mapping Science Metrics Survey DiscussionProblem statement Usage data

Issues with usage data

Lots of interest in leveraging usage data for models of science,tracking scientific activity, metrics, etc. Cf. COUNTER, Ex LibrisbX, and many more.

However many different issues:

Silo: recorded for particular servicefor a particular user community

Lack of standards: recorded indifferent manner, for differentsystems

Lack of research: how leverageusage data for scientific tracking,modeling, metrics, etc.

Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data

Page 27: Tracking science in real-time from large-scale usage datasalsahpc.indiana.edu › tutorials › slides › 0729 › ... · Tracking science in real-time from large-scale usage data

Introduction MESUR Mapping Science Metrics Survey DiscussionMESUR overview Creating MESUR’s reference data set

Outline

1 IntroductionProblem statementUsage data

2 MESURMESUR overviewCreating MESUR’s reference data set

3 Mapping Science

4 Metrics Survey

5 DiscussionOverviewFuture ResearchRelevant papers

Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data

Page 28: Tracking science in real-time from large-scale usage datasalsahpc.indiana.edu › tutorials › slides › 0729 › ... · Tracking science in real-time from large-scale usage data

Introduction MESUR Mapping Science Metrics Survey DiscussionMESUR overview Creating MESUR’s reference data set

MESUR project: survey the potential of usage data at verylarge-scale

Studying scientific activity

Can we model and study patterns of scientific activity fromlarge-scale, representative usage data?

What can we learn about impact and structure from patternsof scientific activity from usage data?

Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data

Page 29: Tracking science in real-time from large-scale usage datasalsahpc.indiana.edu › tutorials › slides › 0729 › ... · Tracking science in real-time from large-scale usage data

Introduction MESUR Mapping Science Metrics Survey DiscussionMESUR overview Creating MESUR’s reference data set

MESUR project: survey the potential of usage data at verylarge-scale

Studying scientific activity

Can we model and study patterns of scientific activity fromlarge-scale, representative usage data?

What can we learn about impact and structure from patternsof scientific activity from usage data?

Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data

Page 30: Tracking science in real-time from large-scale usage datasalsahpc.indiana.edu › tutorials › slides › 0729 › ... · Tracking science in real-time from large-scale usage data

Introduction MESUR Mapping Science Metrics Survey DiscussionMESUR overview Creating MESUR’s reference data set

MESUR project: survey the potential of usage data at verylarge-scale

Studying scientific activity

Can we model and study patterns of scientific activity fromlarge-scale, representative usage data?

What can we learn about impact and structure from patternsof scientific activity from usage data?

Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data

Page 31: Tracking science in real-time from large-scale usage datasalsahpc.indiana.edu › tutorials › slides › 0729 › ... · Tracking science in real-time from large-scale usage data

Introduction MESUR Mapping Science Metrics Survey DiscussionMESUR overview Creating MESUR’s reference data set

MESUR history

2006-2008: Andrew W. MellonFoundation

2008-2009: Los AlamosNational Laboratory

2009-: Indiana University,School of Informatics andComputing

2009-2013: National ScienceFoundation

Team: PI, co-PI, 2 scientists,2 full-time developers, and 1PhD student.

Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data

Page 32: Tracking science in real-time from large-scale usage datasalsahpc.indiana.edu › tutorials › slides › 0729 › ... · Tracking science in real-time from large-scale usage data

Introduction MESUR Mapping Science Metrics Survey DiscussionMESUR overview Creating MESUR’s reference data set

MESUR history

2006-2008: Andrew W. MellonFoundation

2008-2009: Los AlamosNational Laboratory

2009-: Indiana University,School of Informatics andComputing

2009-2013: National ScienceFoundation

Team: PI, co-PI, 2 scientists,2 full-time developers, and 1PhD student.

Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data

Page 33: Tracking science in real-time from large-scale usage datasalsahpc.indiana.edu › tutorials › slides › 0729 › ... · Tracking science in real-time from large-scale usage data

Introduction MESUR Mapping Science Metrics Survey DiscussionMESUR overview Creating MESUR’s reference data set

MESUR history

2006-2008: Andrew W. MellonFoundation

2008-2009: Los AlamosNational Laboratory

2009-: Indiana University,School of Informatics andComputing

2009-2013: National ScienceFoundation

Team: PI, co-PI, 2 scientists,2 full-time developers, and 1PhD student.

Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data

Page 34: Tracking science in real-time from large-scale usage datasalsahpc.indiana.edu › tutorials › slides › 0729 › ... · Tracking science in real-time from large-scale usage data

Introduction MESUR Mapping Science Metrics Survey DiscussionMESUR overview Creating MESUR’s reference data set

MESUR history

2006-2008: Andrew W. MellonFoundation

2008-2009: Los AlamosNational Laboratory

2009-: Indiana University,School of Informatics andComputing

2009-2013: National ScienceFoundation

Team: PI, co-PI, 2 scientists,2 full-time developers, and 1PhD student.

Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data

Page 35: Tracking science in real-time from large-scale usage datasalsahpc.indiana.edu › tutorials › slides › 0729 › ... · Tracking science in real-time from large-scale usage data

Introduction MESUR Mapping Science Metrics Survey DiscussionMESUR overview Creating MESUR’s reference data set

MESUR history

2006-2008: Andrew W. MellonFoundation

2008-2009: Los AlamosNational Laboratory

2009-: Indiana University,School of Informatics andComputing

2009-2013: National ScienceFoundation

Team: PI, co-PI, 2 scientists,2 full-time developers, and 1PhD student.

Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data

Page 36: Tracking science in real-time from large-scale usage datasalsahpc.indiana.edu › tutorials › slides › 0729 › ... · Tracking science in real-time from large-scale usage data

Introduction MESUR Mapping Science Metrics Survey DiscussionMESUR overview Creating MESUR’s reference data set

MESUR history

2006-2008: Andrew W. MellonFoundation

2008-2009: Los AlamosNational Laboratory

2009-: Indiana University,School of Informatics andComputing

2009-2013: National ScienceFoundation

Team: PI, co-PI, 2 scientists,2 full-time developers, and 1PhD student.

Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data

Page 37: Tracking science in real-time from large-scale usage datasalsahpc.indiana.edu › tutorials › slides › 0729 › ... · Tracking science in real-time from large-scale usage data

Introduction MESUR Mapping Science Metrics Survey DiscussionMESUR overview Creating MESUR’s reference data set

MESUR objective

Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data

Page 38: Tracking science in real-time from large-scale usage datasalsahpc.indiana.edu › tutorials › slides › 0729 › ... · Tracking science in real-time from large-scale usage data

Introduction MESUR Mapping Science Metrics Survey DiscussionMESUR overview Creating MESUR’s reference data set

MESUR dataflow

MESURreference data set

Data providers

...

Publishers

Aggregators

Filtering & deduplication

Usagedata

Usagedate

Usagedate

...

ingestion

Parser 1

Parser 2

Parser 3

MESURontology

institutions

impactmetrics

Models

Models

Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data

Page 39: Tracking science in real-time from large-scale usage datasalsahpc.indiana.edu › tutorials › slides › 0729 › ... · Tracking science in real-time from large-scale usage data

Introduction MESUR Mapping Science Metrics Survey DiscussionMESUR overview Creating MESUR’s reference data set

Creating MESUR’s data set

MESURreference data set

Data providers

...

Publishers

Aggregators

Filtering & deduplication

Usagedata

Usagedate

Usagedate

...

ingestion

Parser 1

Parser 2

Parser 3

MESURontology

institutions

impactmetrics

Models

Models

0.0e

+00

5.0e

+06

1.0e

+07

1.5e

+07

date

num

ber o

f eve

nts

2004 2005 2006 2007 2008 2009

Providers 2006-20010: BMC, Blackwell, UC, CSU (23),EBSCO, ELSEVIER, EMERALD, INGENTA, JSTOR,LANL, MIMAS/ZETOC, THOMSON, UPENN (9),UTEXAS

events 1,000,000,000 usage events

citations +500,000,000 citations,

articles and serials +50M articles, +-100,000 serials

Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data

Page 40: Tracking science in real-time from large-scale usage datasalsahpc.indiana.edu › tutorials › slides › 0729 › ... · Tracking science in real-time from large-scale usage data

Introduction MESUR Mapping Science Metrics Survey DiscussionMESUR overview Creating MESUR’s reference data set

Data requirements

Minimum requirements

Separate user requests

Date-time stamp down to second

Document identifier or sufficient metadata to de-duplicate

Session ID (or anonymized user ID)

Request type identifier

http://tweety.lanl.gov/public/schemas/2007-01/mesur.owl

Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data

Page 41: Tracking science in real-time from large-scale usage datasalsahpc.indiana.edu › tutorials › slides › 0729 › ... · Tracking science in real-time from large-scale usage data

Introduction MESUR Mapping Science Metrics Survey DiscussionMESUR overview Creating MESUR’s reference data set

MESUR’s OWL/RDF ontology

1

1Rodriguez, Bollen & Van de Sompel. A Practical Ontology for the Large-Scale Modeling of Scholarly

Artifacts and their Usage. JCDL07 - Based on OntologyX work.

Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data

Page 42: Tracking science in real-time from large-scale usage datasalsahpc.indiana.edu › tutorials › slides › 0729 › ... · Tracking science in real-time from large-scale usage data

Introduction MESUR Mapping Science Metrics Survey Discussion

Outline

1 IntroductionProblem statementUsage data

2 MESURMESUR overviewCreating MESUR’s reference data set

3 Mapping Science

4 Metrics Survey

5 DiscussionOverviewFuture ResearchRelevant papers

Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data

Page 43: Tracking science in real-time from large-scale usage datasalsahpc.indiana.edu › tutorials › slides › 0729 › ... · Tracking science in real-time from large-scale usage data

Introduction MESUR Mapping Science Metrics Survey Discussion

Reference data set

Subsetting

Common time period March 1st 2006 - February 1st 2007

Providers: Thomson Scientic (Web of Science), Elsevier(Scopus), JSTOR, Ingenta, University of Texas (9campuses, 6 health institutions), and California StateUniversity (23 campuses)

Scale: 346,312,045 usage events, 97,532 serials (many ofwhich not journals)

Domain Usage UC Degrees JCR

Natural Science 37% 39% 92.8%Social Sciences 45% 46% 7.2%

Humanities 14% 15%

Source: http://www.ucop.edu/ucophome/uwnews/stat/statsum/fall2007/statsumm2007.pdf (table 9)

Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data

Page 44: Tracking science in real-time from large-scale usage datasalsahpc.indiana.edu › tutorials › slides › 0729 › ... · Tracking science in real-time from large-scale usage data

Introduction MESUR Mapping Science Metrics Survey Discussion

Reference data set

Subsetting

Common time period March 1st 2006 - February 1st 2007

Providers: Thomson Scientic (Web of Science), Elsevier(Scopus), JSTOR, Ingenta, University of Texas (9campuses, 6 health institutions), and California StateUniversity (23 campuses)

Scale: 346,312,045 usage events, 97,532 serials (many ofwhich not journals)

Domain Usage UC Degrees JCR

Natural Science 37% 39% 92.8%Social Sciences 45% 46% 7.2%

Humanities 14% 15%

Source: http://www.ucop.edu/ucophome/uwnews/stat/statsum/fall2007/statsumm2007.pdf (table 9)

Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data

Page 45: Tracking science in real-time from large-scale usage datasalsahpc.indiana.edu › tutorials › slides › 0729 › ... · Tracking science in real-time from large-scale usage data

Introduction MESUR Mapping Science Metrics Survey Discussion

Reference data set

Subsetting

Common time period March 1st 2006 - February 1st 2007

Providers: Thomson Scientic (Web of Science), Elsevier(Scopus), JSTOR, Ingenta, University of Texas (9campuses, 6 health institutions), and California StateUniversity (23 campuses)

Scale: 346,312,045 usage events, 97,532 serials (many ofwhich not journals)

Domain Usage UC Degrees JCR

Natural Science 37% 39% 92.8%Social Sciences 45% 46% 7.2%

Humanities 14% 15%

Source: http://www.ucop.edu/ucophome/uwnews/stat/statsum/fall2007/statsumm2007.pdf (table 9)

Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data

Page 46: Tracking science in real-time from large-scale usage datasalsahpc.indiana.edu › tutorials › slides › 0729 › ... · Tracking science in real-time from large-scale usage data

Introduction MESUR Mapping Science Metrics Survey Discussion

Reference data set

Subsetting

Common time period March 1st 2006 - February 1st 2007

Providers: Thomson Scientic (Web of Science), Elsevier(Scopus), JSTOR, Ingenta, University of Texas (9campuses, 6 health institutions), and California StateUniversity (23 campuses)

Scale: 346,312,045 usage events, 97,532 serials (many ofwhich not journals)

Domain Usage UC Degrees JCR

Natural Science 37% 39% 92.8%Social Sciences 45% 46% 7.2%

Humanities 14% 15%

Source: http://www.ucop.edu/ucophome/uwnews/stat/statsum/fall2007/statsumm2007.pdf (table 9)

Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data

Page 47: Tracking science in real-time from large-scale usage datasalsahpc.indiana.edu › tutorials › slides › 0729 › ... · Tracking science in real-time from large-scale usage data

Introduction MESUR Mapping Science Metrics Survey Discussion

Article clickstreams

usage data log: U = {u1, u2, · · · , un}u = {s, t, a}s = session identifier , t = a date-time and a = articlef ∈ F = clickstreams extracted from Uf ⊂ U, wheref = (∀u ∈ U, ∃s : s(u) ∧ t(ui ) < t(ui+1))s(u) and t(u): session identifier and date-time of interaction u.

v1

a1

v2

a2 a3

v3

a4

v1

a1

v3

a4 articles

onlineuser interactions

journals

clickstream c1

i2i1i4i1 i2 i3

time

session1 session2

clickstream c2

Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data

Page 48: Tracking science in real-time from large-scale usage datasalsahpc.indiana.edu › tutorials › slides › 0729 › ... · Tracking science in real-time from large-scale usage data

Introduction MESUR Mapping Science Metrics Survey Discussion

Journal Clickstreams

article clickstream: fa = (a1, a2, · · · , ak)journal clickstream: fv = (v1, v2, · · · , vk)We observe: N(vi , vj)for all pairs (vi , vj) in which j = i + 1From which follows:

P(vi , vj) =N(vi , vj)∑j N(vi , vj)

andM whose entries mi ,j = P(vi , vj).

v1

v4

v3

v2

15

5

30

v1

v4

v3

v2

0.3

0.1

0.6

Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data

Page 49: Tracking science in real-time from large-scale usage datasalsahpc.indiana.edu › tutorials › slides › 0729 › ... · Tracking science in real-time from large-scale usage data

Introduction MESUR Mapping Science Metrics Survey Discussion

Examples of prominent connections

vi vj p(vi , vj ) N(vi , vj ) N(vi )

American Journal of International Law International Organization 0.0207 9,292 448,034International Affairs 0.0184 8,254International and Comparative Law Quarterly 0.0171 7,654Foreign Policy 0.0167 7,500American Political Science Association 0.0140 6,291

Journal of Educational Sociology American Journal of Sociology 0.0334 2,790 83,419Journal of Higher Education 0.0303 2,529Journal of Negro Education 0.0286 2,389American Sociological Review 0.0276 2,303Social Forces 0.0249 2,076

Surface Science Physical Review B 0.0704 2,555 36,282Applied Surface Science 0.0341 1,239Physical Review Letters 0.0339 1,230Journal of Chemical Physics 0.0333 1,207Applied Physics Letters 0.0327 1,188

Journal of Organic Chemistry Journal of the American Chemical Society 0.0873 4,141 47,439Tetrahedron Letters 0.0865 4,105Tetrahedron 0.0602 2,857Organic Letters 0.0532 2,526Angewandte Chemie 0.0305 1,448

Ecological Applications Ecology 0.0965 13,659 141,481Conservation Biology 0.0524 7,408Bioscience 0.0215 3,043Annual Review of Ecology and Systematics 0.0215 3,043Clinical and Experimental Allergy 0.0191 2,699

Annals of Mathematics American Journal of Mathematics 0.0705 5,392 76,526American Mathematical Monthly 0.0579 4,432PNAS 0.0156 1,195Econometrica 0.0082 624Mathematics Magazine 0.0077 587

Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data

Page 50: Tracking science in real-time from large-scale usage datasalsahpc.indiana.edu › tutorials › slides › 0729 › ... · Tracking science in real-time from large-scale usage data

Introduction MESUR Mapping Science Metrics Survey Discussion

Network parameters

Network matrix

Parameter M M ′

Journals 97,532 2,307Edges 6,783,552 50,000

Matrix density 0.071% 0.939%Strongly Connected Components (SCC) 16,474 236

Journals in SCC 80,934 1,944Average journal clustering coefficient (SCC) 0.285 0.514

Diameter of largest SCC 37 14

Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data

Page 51: Tracking science in real-time from large-scale usage datasalsahpc.indiana.edu › tutorials › slides › 0729 › ... · Tracking science in real-time from large-scale usage data

Introduction MESUR Mapping Science Metrics Survey Discussion

logdata

matrix M

matrix M'

most reliableedges

97k journals6.7M edges

346Minteractions

MESUR

1Binteractions

map

rankings2307 journals50k edges

all edgescommon

date range

rank

N(vi,vj)

05000

12000

20000

28000

36000

0e+00 3e+04 1e+05 2e+05

Fruchterman (1991) Graph Drawing by Force-directed

Placement. Software, 21(11):1129-1164

Classification code:

root

DeweyJCR

Journals1 2 3

AATtaxonomy

4 5

Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data

Page 52: Tracking science in real-time from large-scale usage datasalsahpc.indiana.edu › tutorials › slides › 0729 › ... · Tracking science in real-time from large-scale usage data

Introduction MESUR Mapping Science Metrics Survey Discussion

Appliedphysics

Organicchemistry

AnalyticalChemistry

BiochemistrySocial and personality

psychology

Polymers

Miocrobiology

Biotechnology

Music

PlantbiologyBiodiversity

Hydrology

Genetics

Brainresearch

Geology

Agriculture

Alternativeenergy

Toxicology

Plantagriculture

AnimalBehavior

EnvironmentalScience

ChemicalEngineering

Tourism

Nutrition

Neurology

Sportsmedicine Classical

studies

NursingPsychology

Geography

Social work

Dermatology

ClinicalPharmacology

Internationalstudies

Statistics

Archeology

Demographics

Economics

ChildPsychology

Humangeography

Productionresearch

Manufacturing Material scienceEngineering

Clinicaltrials

Education

Anthropology

Publichealth

Law

Sociology

Philosophy

Asian studies

Religion

MinerologyAcoustics

Statisticalphysics

Thermodynamics

Physical chemistry

Pharmaceuticalresearch

Electrochemistry

Ecology

Language

CognitiveScience

Brain studies

Soil/Marinebiology

Physiology

Plantgenetics

ArchitectureDesign

Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data

Page 53: Tracking science in real-time from large-scale usage datasalsahpc.indiana.edu › tutorials › slides › 0729 › ... · Tracking science in real-time from large-scale usage data

Introduction MESUR Mapping Science Metrics Survey Discussion

The Arts and Architecture Thesaurus (Getty ResearchCenter)

Table: Distance from AAT root (α) and number of classificationsNc at that level. Each α produces a finer-grained separation ofscientific disciplines.

Distance (α) Nc Example classifications

1 4 Natural sciences, social sciences, humanities, · · ·2 8 Biology, chemistry, physics, · · ·3 31 Classics, communication, engineering, · · ·4 195 Allergy, anesthesiology, applied linguistics, · · ·

Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data

Page 54: Tracking science in real-time from large-scale usage datasalsahpc.indiana.edu › tutorials › slides › 0729 › ... · Tracking science in real-time from large-scale usage data

Introduction MESUR Mapping Science Metrics Survey Discussion

root

DeweyJCR

Journals1 2 3

AATtaxonomy

4 5

Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data

Page 55: Tracking science in real-time from large-scale usage datasalsahpc.indiana.edu › tutorials › slides › 0729 › ... · Tracking science in real-time from large-scale usage data

Introduction MESUR Mapping Science Metrics Survey Discussion

Cross-validating the clickstream map

f (vi , vj , α) =

{1 Cα(vi ) = Cα(vj)

0 Cα(vi ) 6= Cα(vj)

17

logdata P (vi, vj) ! M ! =

!"#

p0,0 · · · p0,n

.... . .

...pn,0 · · · pn,n

$%& ,

Ni,j > µ0.5(Nk)Ni,j ! µ0.5(Nk)""#

$$%F1

!2

AAT f(vi, vj ,") ! A! =

!"#

a0,0 · · · a0,n

.... . .

...an,0 · · · an,n

$%& ,

ai,j = 0ai,j = 1

F2& &

&

'

contingency table at "

Figure 6. Cross-validating the map structure given by M ! to journal relationships derivedfrom AAT journal classifications, i.e. matrix A!.

Table 2. Network parameters of original (M) and reduced (M !) clickstream matrices.Network matrix

Parameter M M !

Journals 97,532 2,307Edges 6,783,552 50,000

Matrix density 0.071% 0.939%Strongly Connected Components (SCC) 16,474 236

Journals in SCC 80,934 1,944Average journal clustering coe!cient (SCC) 0.285 0.514

Diameter of largest SCC 37 14

α = 1 : p < 0.0001α = 2 : p < 0.0001α = 3 : p < 0.0001α = 4 : p < 0.0001

Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data

Page 56: Tracking science in real-time from large-scale usage datasalsahpc.indiana.edu › tutorials › slides › 0729 › ... · Tracking science in real-time from large-scale usage data

Introduction MESUR Mapping Science Metrics Survey Discussion

Betweenness centrality

Cb(vk) =∑

i 6=j 6=k

σi ,j(vk)

σi ,j(1)

Table: Ranking of journals from M ′ according to betweennesscentrality.

Rank Journal Top-level AAT classification

1 Science Natural Sciences2 Proceedings of the National Academy of Sciences Natural Sciences3 Environmental Health Perspectives Natural Science4 Chemosphere Natural Sciences5 Journal of Advanced Nursing Natural Sciences6 Nature Natural Sciences7 Ecology Natural Sciences8 Milbank Quarterly Natural Sciences9 Applied and Environmental Microbiology Natural Sciences

10 Child Development Social Sciences11 Behavioral Ecology and Sociobiology Social Sciences12 Journal of Colloid and Information Science Natural Sciences13 American Anthropologist Social Sciences14 Journal of Biogeography Natural Sciences15 Materials Science and Technology Natural Sciences

Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data

Page 57: Tracking science in real-time from large-scale usage datasalsahpc.indiana.edu › tutorials › slides › 0729 › ... · Tracking science in real-time from large-scale usage data

Introduction MESUR Mapping Science Metrics Survey Discussion

PageRank

PR(vi ) =1− λ

N+ λ

j

PR(vj)

O(vj)(2)

Table: Ranking of journals from M ′ according to PageRank(λ = 0.85).

Rank Journal Top-level AAT classification

1 Applied Physics Letters Natural Sciences2 Journal of Advanced Nursing Natural Sciences3 Journal of the American Chemical Society Natural Sciences4 Ecology Natural Sciences5 Nature Natural Sciences6 Physical Review B Natural Sciences7 Journal of Applied Physics Natural Sciences8 American Economic Review Social Sciences9 American Historical Review Social Sciences

10 Physical Review Letters Natural Sciences11 Science Natural Sciences12 Langmuir Natural Sciences13 Journal of Chemical Physics Natural Sciences14 American Anthropologist Social Sciences15 Annals of the American Academy of Political and Social Science Social Science

Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data

Page 58: Tracking science in real-time from large-scale usage datasalsahpc.indiana.edu › tutorials › slides › 0729 › ... · Tracking science in real-time from large-scale usage data

Introduction MESUR Mapping Science Metrics Survey Discussion

Outline

1 IntroductionProblem statementUsage data

2 MESURMESUR overviewCreating MESUR’s reference data set

3 Mapping Science

4 Metrics Survey

5 DiscussionOverviewFuture ResearchRelevant papers

Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data

Page 59: Tracking science in real-time from large-scale usage datasalsahpc.indiana.edu › tutorials › slides › 0729 › ... · Tracking science in real-time from large-scale usage data

Introduction MESUR Mapping Science Metrics Survey Discussion

Metrics

ID Type Measure Source Network parameters PC1 PC2 ρ̄

1 Citation Scimago Journal Rank Scimago/Scopus -0.974 -8.296 0.556?

2 Citation Immediacy Index JCR 2007 1.659 -7.046 0.508?

3 Citation Closeness Centrality JCR 2007 Undirected, weighted 0.339 -6.284 0.565?

4 Citaton Cites per doc Scimago/Scopus -1.311 -6.192 0.588?

5 Citation Journal Impact Factor JCR 2007 -1.854 -5.937 0.592?

6 Citation Closeness centrality JCR 2007 Undirected, unweighted -1.388 -4.827 0.6197 Citation Out-degree centrality JCR 2007 Directed, weighted -3.191 -4.215 0.6428 Citation Out-degree centrality JCR 2007 Directed, unweighted -2.703 -4.015 0.6409 Citation Degree Centrality JCR 2007 Undirected, weighted -4.850 -2.834 0.69010 Citation Degree Centrality JCR 2007 Undirected, unweighted -4.398 -2.643 0.69111 Citation H-Index Scimago/Scopus -3.326 -2.003 0.68112 Citation Scimago Total cites Scimago/Scopus -4.926 -1.722 0.71213 Citation Journal Cite Probability JCR 2007 -5.389 -1.647 0.71014 Citation In-degree centrality JCR 2007 Directed, unweighted -5.302 -1.429 0.71715 Citation In-degree centrality JCR 2007 Directed, weighted -5.380 -1.554 0.71216 Citation PageRank JCR 2007 Directed, unweighted -4.476 0.108 0.69317 Citation PageRank JCR 2007 Undirected, unweighted -4.929 0.731 0.72618 Citation PageRank JCR 2007 Undirected, weighted -4.160 0.864 0.69619 Citation PageRank JCR 2007 Directed, weighted -3.103 0.333 0.65920 Citation Y-factor JCR 2007 Directed, weighted -2.971 0.317 0.65721 Citation Betweenness centrality JCR 2007 Undirected, weighted -0.462 0.872 0.64322 Citation Betweenness centrality JCR 2007 Undirected, unweighted -0.474 1.609 0.64223 Citation Citation Half-Life JCR 2007 / / 0.03724 Usage Closeness centrality MESUR 2007 Undirected, weighted 3.130 2.683 0.703

25 Usage Closeness centrality MESUR 2007 Undirected, unweighted 3.100 3.899 0.73126 Usage Degree centrality MESUR 2007 Undirected, unweighted 3.271 3.873 0.72927 Usage PageRank MESUR 2007 Undirected, unweighted 3.327 4.192 0.72828 Usage PageRank MESUR 2007 Directed, unweighted 3.463 4.336 0.72729 Usage In-degree centrality MESUR 2007 Directed, unweighted 3.463 4.015 0.72830 Usage Out-degree centrality MESUR 2007 Directed, unweighted 3.484 3.994 0.72731 Usage PageRank MESUR 2007 Directed, weighted 3.780 4.217 0.71032 Usage PageRank MESUR 2007 Undirected, weighted 3.813 4.223 0.71033 Usage Betweenness centrality MESUR 2007 Undirected, unweighted 3.988 4.271 0.69934 Usage Betweenness centrality MESUR 2007 Undirected, weighted 3.957 3.698 0.69335 Usage Degree centrality MESUR 2007 Undirected, weighted 5.293 3.528 0.68336 Usage Out-degree centrality MESUR 2007 Directed, weighted 5.302 3.518 0.68337 Usage In-degree centrality MESUR 2007 Directed, weighted 5.286 3.531 0.68338 Usage Journal Use Probability MESUR 2007 8.914 1.833 0.59339 Usage Usage Impact Factor MESUR 2007 / / 0.279

Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data

Page 60: Tracking science in real-time from large-scale usage datasalsahpc.indiana.edu › tutorials › slides › 0729 › ... · Tracking science in real-time from large-scale usage data

Introduction MESUR Mapping Science Metrics Survey Discussion

R10×10 =

0BBBBBBBBBBBBB@

1.00 0.71 0.77 0.52 0.79 0.55 0.69 0.63 0.60 0.180.71 0.99 0.52 0.69 0.79 0.85 0.49 0.44 0.49 0.220.77 0.52 1.00 0.62 0.63 0.39 0.70 0.73 0.68 0.200.52 0.69 0.62 1.00 0.68 0.78 0.49 0.56 0.65 0.060.79 0.79 0.63 0.68 1.00 0.82 0.66 0.62 0.66 0.150.55 0.85 0.39 0.78 0.82 1.00 0.40 0.40 0.50 0.130.69 0.49 0.70 0.49 0.66 0.40 1.00 0.89 0.85 0.530.63 0.44 0.73 0.56 0.62 0.40 0.89 1.00 0.97 0.450.60 0.49 0.68 0.65 0.66 0.50 0.85 0.97 1.00 0.420.18 0.22 0.20 0.06 0.15 0.13 0.53 0.45 0.42 1.00

1CCCCCCCCCCCCCA

19: Citation PageRank5: Journal Impact Factor22: Citation Betweenness6: Citation Closeness11: Citation H-index1: Citation Scimago Journal Rank31: Usage PageRank34: Usage Betweenness24: Usage Closeness39: Usage Impact Factor

Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data

Page 61: Tracking science in real-time from large-scale usage datasalsahpc.indiana.edu › tutorials › slides › 0729 › ... · Tracking science in real-time from large-scale usage data

Introduction MESUR Mapping Science Metrics Survey Discussion

2007 JCR MESURcitation data usage log data

Scimago

citationnetwork

7,388 x 7,388

usagenetwork

7,575 x 7,575

2,5,233,6,7,8,9,10,13,1415,16,17,18,19,20

21,22

24,25,26,27,28,29,3931,32,33,34,35,36,37,

38,391,4,11,12

data sources

impact measures

citation data

39x39

correlationmatrix

intersection7,584

journals12,751journals

Schematic representation of data sources and processing. Impact measure identifiers.

Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data

Page 62: Tracking science in real-time from large-scale usage datasalsahpc.indiana.edu › tutorials › slides › 0729 › ... · Tracking science in real-time from large-scale usage data

Introduction MESUR Mapping Science Metrics Survey Discussion

Principal Component Analysis

PC1 PC2 PC3 PC4 PC5

Proportion of Variance 66.1% 17.3% 9.2% 4.8% 0.9%Cumulative Proportion 66.1% 83.4% 92.6% 97.4% 98.3%

Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data

Page 63: Tracking science in real-time from large-scale usage datasalsahpc.indiana.edu › tutorials › slides › 0729 › ... · Tracking science in real-time from large-scale usage data

Introduction MESUR Mapping Science Metrics Survey Discussion

!! " ! #"

!!

"!

$%#

$%&

22

2119

5

181716

151413 12

109

876

4 3

2

24

25 2627 2829

30313233

343536

37

38

1

11

20

Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data

Page 64: Tracking science in real-time from large-scale usage datasalsahpc.indiana.edu › tutorials › slides › 0729 › ... · Tracking science in real-time from large-scale usage data

Introduction MESUR Mapping Science Metrics Survey Discussion

PC1

PC2

UsageBetweenness

PageRank

citation Betweenness

PageRank

Total Cites

Cites per Doc

JIF06

UsageProbability

CitationImmediacy

+-

+

-

Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data

Page 65: Tracking science in real-time from large-scale usage datasalsahpc.indiana.edu › tutorials › slides › 0729 › ... · Tracking science in real-time from large-scale usage data

Introduction MESUR Mapping Science Metrics Survey Discussion

2.5 2.0 1.5 1.0 0.5 0.0

38: Journal Use Probability31: Usage PageRank32: Usage PageRank36: Usage Out.degree37: Usage In.degree centr.35: Usage Degree centr.24: Usage Closen. centr.33: Usage Betw. centr.34: Usage Betw. centr.25: Usage Closen. centr.26: Usage Degree centr.28: Usage PageRank27: Usage PageRank29: Usage In.degree centr.30: Usage Out.degree centr.11: Scimago H.index12: Scimago Total Cites14: Citation In.degree centr.15: Citation In.degree centr.13: Cite Probability17: Citation PageRank10: Citation Degree centr.9: Citation Degree centr.22: Citation Between. centr.21: Citation Between. centr.19: Citation PageRank20: Citation Y.factor16: Citation PageRank18: Citation PageRank3: Citation Closen. centr.6: Citation Closen. centr.8: Citation Out.degree centr.7: Citation Out.degree centr.2: Citation Immediacy Index1: Scimago Journal Rank5: Journal Impact Factor4: Cites per doc

Cluster Measures Interpretation

1 38 Journal Use Probability2 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37 Usage measures3 1, 2, 3, 4, 5 JIF, SJR, Cites per Document measures4 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 Total Citation rates and distributions5 16, 17, 18, 19, 20, 21, 22 Citation Betweenness and PageRank

Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data

Page 66: Tracking science in real-time from large-scale usage datasalsahpc.indiana.edu › tutorials › slides › 0729 › ... · Tracking science in real-time from large-scale usage data

Introduction MESUR Mapping Science Metrics Survey DiscussionOverview Future Research Relevant papers

Outline

1 IntroductionProblem statementUsage data

2 MESURMESUR overviewCreating MESUR’s reference data set

3 Mapping Science

4 Metrics Survey

5 DiscussionOverviewFuture ResearchRelevant papers

Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data

Page 67: Tracking science in real-time from large-scale usage datasalsahpc.indiana.edu › tutorials › slides › 0729 › ... · Tracking science in real-time from large-scale usage data

Introduction MESUR Mapping Science Metrics Survey DiscussionOverview Future Research Relevant papers

MESUR: so far

Usage data: single largest reference data set of usage, citationand bibliographic data

+1,000,000,000 usage events loadedmultiple publishers, aggregators and institutionsInfrastructure for research program

Usage graphs: track scientific flow of activity

real-time studies of scienceinclusive of larger “scholarly community”

Metrics: valid, vetted indicators of scientific impact

Different facets of scholarly impactSimple metrics, good results. Law of diminishingreturns?Hybrid, consensus metrics?

Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data

Page 68: Tracking science in real-time from large-scale usage datasalsahpc.indiana.edu › tutorials › slides › 0729 › ... · Tracking science in real-time from large-scale usage data

Introduction MESUR Mapping Science Metrics Survey DiscussionOverview Future Research Relevant papers

MESUR: the bad

Sustainability: burden of data collection

ad hoc, customized agreements with dataprovidersrestrictive agreements with regards to datasharinghigh costs of maintaining infrastructure: funding

Research program: too much to do

3rd party access: better sciencemany eyes on data

Metrics and services: Community support and acceptance

Can’t be limited to academic exerciseInvestigations need to be usefulAccepted by community, become part ofscholarly assessment system

Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data

Page 69: Tracking science in real-time from large-scale usage datasalsahpc.indiana.edu › tutorials › slides › 0729 › ... · Tracking science in real-time from large-scale usage data

Introduction MESUR Mapping Science Metrics Survey DiscussionOverview Future Research Relevant papers

Future Research: Two directions

Longitudinal research and dynamic models of science

Bursty behavior of science

Timeseries of reads over time are burstyCoordinated: social networks at work?

Modeling

Stochastic and agent-based models of scientific activityCitation following? Group-think? Find parameters of humaninformation searching behavior.

Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data

Page 70: Tracking science in real-time from large-scale usage datasalsahpc.indiana.edu › tutorials › slides › 0729 › ... · Tracking science in real-time from large-scale usage data

Introduction MESUR Mapping Science Metrics Survey DiscussionOverview Future Research Relevant papers

Bursty behavior

0 50 100 150 200 250 300 350

100

300

500

700

APPLIED STATISTICS: time series

Index

j[, 2

]

0 50 100 150 200 250 300 350

−200

020

040

0

APPLIED STATISTICS: residuals percentiles

Index

j[, 2

] − m

$y

0 50 100 150 200 250 300 350

−200

020

040

0

APPLIED STATISTICS: residuals smoothed

Index

mr$

y

Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data

Page 71: Tracking science in real-time from large-scale usage datasalsahpc.indiana.edu › tutorials › slides › 0729 › ... · Tracking science in real-time from large-scale usage data

Introduction MESUR Mapping Science Metrics Survey DiscussionOverview Future Research Relevant papers

Contagion?

0 7 14 21 28 35 42 49 56 63 70 77 84 91 98 105 112 119 126 133 140 147 154 161 168 175 182 189 196 203 210 217 224 231 238 245 252 259 266 273 280 287 294 301 308 315 322 329 336

PERCEPTION & PSYCHOPHYSICSMOLECULAR PHYSICSADVANCES IN PHYSICSJOURNAL OF PHYSICS B ATOMIC MOLECULAR AND OPTICAL PHYSICJOURNAL OF COMPUTATIONAL PHYSICSJOURNAL OF CHEMICAL PHYSICSCENTRAL EUROPEAN JOURNAL OF PHYSICSCELL BIOCHEMISTRY AND BIOPHYSICSPHYSICS LETTERS BCONTEMPORARY PHYSICSNUCLEAR INSTRUMENTS & METHODS IN PHYSICS RESEARCH SECTCHEMICAL PHYSICSPHYSICA STATUS SOLIDI (B)NUCLEAR INSTRUMENTS & METHODS IN PHYSICS RESEARCH SECTINTERNATIONAL JOURNAL OF THERMOPHYSICSPHYSICA SCRIPTACANADIAN JOURNAL OF PHYSICSCHEMICAL PHYSICS LETTERSPHYSICA DAPPLIED PHYSICSCHINESE PHYSICS LETTERSRADIATION PHYSICS AND CHEMISTRYJOURNAL OF PHYSICS D APPLIED PHYSICSCURRENT APPLIED PHYSICSPHYSICA ANUCLEAR PHYSICS AMEDICAL PHYSICSNUCLEAR PHYSICS BPHYSICA CINTERNATIONAL JOURNAL OF THEORETICAL PHYSICSJOURNAL OF STATISTICAL PHYSICSFOUNDATIONS OF PHYSICSNEW JOURNAL OF PHYSICSPHYSICS OF PLASMASMATERIALS CHEMISTRY AND PHYSICSJOURNAL OF THE MECHANICS AND PHYSICS OF SOLIDSAPPLIED PHYSICS BCZECHOSLOVAK JOURNAL OF PHYSICSASTRONOMY & ASTROPHYSICSJOURNAL OF PHYSICS A GENERAL PHYSICSPHYSICS OF FLUIDSTECTONOPHYSICSPHYSICS LETTERS AJOURNAL OF PHYSICS G NUCLEAR AND PARTICLE PHYSICSBIOCHIMICA ET BIOPHYSICA ACTA (BBA) − BIOENERGETICSJAPANESE JOURNAL OF APPLIED PHYSICSPHYSICS OF ATOMIC NUCLEIASTRONOMY & GEOPHYSICSQUARTERLY REVIEWS OF BIOPHYSICSINTERNATIONAL JOURNAL OF MODERN PHYSICS APHYSICS OF THE EARTH AND PLANETARY INTERIORSPHYSICA BPHYSICA EMODERN PHYSICS LETTERS AATMOSPHERIC CHEMISTRY AND PHYSICSPHYSICS AND CHEMISTRY OF LIQUIDSSOLAR PHYSICSNATURE PHYSICSEUROPEAN JOURNAL OF PHYSICSFOUNDATIONS OF PHYSICS LETTERSPHYSICS TODAYBIOCHIMICA ET BIOPHYSICA ACTA (BBA) − BIOMEMBRANESASTROPHYSICS AND SPACE SCIENCEPHYSICS AND CHEMISTRY OF THE EARTH PARTS A/B/CACTA CRYSTALLOGRAPHICA SECTION A CRYSTAL PHYSICS DIFFRAANNALS OF PHYSICSBIOCHIMICA ET BIOPHYSICA ACTA (BBA) − MOLECULAR BASIS OF DEUROPHYSICS LETTERSAMERICAN JOURNAL OF PHYSICSJOURNAL OF HIGH ENERGY PHYSICS

Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data

Page 72: Tracking science in real-time from large-scale usage datasalsahpc.indiana.edu › tutorials › slides › 0729 › ... · Tracking science in real-time from large-scale usage data

Introduction MESUR Mapping Science Metrics Survey DiscussionOverview Future Research Relevant papers

Contagion?

! " #$ %# %& '( $% $) (* *' "! "" &$ )# )& #!( ##% ##) #%* #'' #$! #$" #($ #*# #*& #"( #&% #&) #)* %!' %#! %#" %%$ %'# %'& %$( %(% %() %** %"' %&! %&" %)$ '!# '!& '#( '%% '%) ''*

+,-./0-1234,2.5063/37849

7:8/05393.54+0./;0+,-./063/37849

./8-.2063/37849

428/84.2063/37849

.//.2901<0+,-./063/37849

=1,5/.201<0.998973;053>51;,4781/0./;063/37849

=1,5/.201<0./8-.20?533;8/60./;063/37849

-1234,2.50>+@2163/378490./;03A12,781/

<,/6.2063/378490./;0?81216@

-1234,2.5063/378490./;0-37.?1289-

?3+.A815063/37849

41/935A.781/063/37849

/.7,53063/37849

63/3784.

982A.3063/3784.

/3:063/378490B0914837@

3,51>3./0=1,5/.201<08--,/163/37849

.-3584./0=1,5/.201<0+,-./063/37849

?814+3-84.2063/37849

=1,5/.201<0-3;84.2063/37849

4./435063/378490./;04@7163/37849

>219063/37849

.//,.2053A83:01<063/37849

753/;908/063/37849

8--,/163/37849

.-3584./0=1,5/.201<0-3;84.2063/378490>.570.

5,998./0=1,5/.201<063/37849

+,-./063/37849

.//,.2053A83:01<063/1-8490./;0+,-./063/37849

=1,5/.201<0+,-./063/37849

63/390B063/378409@973-9

=1,5/.201<063/37840>9@4+1216@

63/3784.205393.54+

=1,5/.201<0/3,5163/37849

4@7163/37840./;063/1-305393.54+

63/378409148.20./;063/35.20>9@4+1216@0-1/165.>+9

/3,5163/37849

>9@4+8.7584063/37849

63/3784073978/6

>+.5-.4163/37849

63/378490./;0-1234,2.50?81216@

?-4063/37849

4,553/701>8/81/08/063/378490B0;3A321>-3/7

3,51>3./0=1,5/.201<0-3;84.2063/37849

-1234,2.50./;063/35.2063/37849

3,51>3./0=1,5/.201<0+,-./063/37849

.-3584./0=1,5/.201<0-3;84.2063/378490>.570?0/3,51>9@4+

63/378403/68/3358/60/3:9

63/378403>8;3-81216@

.//.2390;3063/378C,3

-,7.781/05393.54+0!063/3784071D841216@0./;03/A851/-3/7

.-3584./0=1,5/.201<0-3;84.2063/378490>.5704093-8/.5908/0-

63/37849093234781/03A12,781/

4,553/7063/37849

=1,5/.201<063/37849

.;A./43908/063/37849

63/3784908/0-3;848/3

E153./0=1,5/.201<063/37849

8/735/.781/.20=1,5/.201<08--,/163/37849

41--,/87@063/37849

>+.5-.4163/378490./;063/1-849

63/3784041,/9328/6

/.7,53053A83:9063/37849

63/37849

Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data

Page 73: Tracking science in real-time from large-scale usage datasalsahpc.indiana.edu › tutorials › slides › 0729 › ... · Tracking science in real-time from large-scale usage data

Introduction MESUR Mapping Science Metrics Survey DiscussionOverview Future Research Relevant papers

Relevant papers

Johan Bollen, Herbert Van de Sompel, Aric Hagberg and RyanChuteA Principal Component Analysis of 39 Scientific ImpactMeasures.PLoS ONE, June 2009. URL:http://dx.plos.org/10.1371/journal.pone.0006022.

Michael Kurtz and Johan Bollen.Usage bibliometrics.Annual Review of Information Science and Technology, 2010

Johan Bollen, Herbert Van de Sompel, Aric Hagberg,LuisBettencourt, Ryan Chute, Marko A. Rodriguez, LyudmilaBalakireva.Clickstream data yields high-resolution maps of science.PLoS One, February 2009.

Johan Bollen - [email protected] Tracking science in real-time from large-scale usage data