32
! Polluted Leftovers: Repository Metrics from the Perspective of a Most Downloaded Item Jon Wheeler [email protected] | Kenning Arlitsch [email protected] Patrick OBrien, Montana State University | Jeff Mixter, OCLC Research | Leila Sterman, Montana State University | 4/3/17 Coali,on for Networked Informa,on mee,ng, Albuquerque, NM 1

1 Polluted Leftovers: Repository Metrics from the ... · Web Service that Accurately Counts Item Downloads from Institutional Repositories,” Library Hi Tech, vol. 35, no. 1, March

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 1 Polluted Leftovers: Repository Metrics from the ... · Web Service that Accurately Counts Item Downloads from Institutional Repositories,” Library Hi Tech, vol. 35, no. 1, March

!PollutedLeftovers:RepositoryMetricsfromthePerspectiveofaMostDownloadedItem

[email protected]|[email protected],MontanaStateUniversity|JeffMixter,OCLCResearch|LeilaSterman,MontanaStateUniversity|

4/3/17Coali,onforNetworkedInforma,onmee,ng,Albuquerque,NM

1

Page 2: 1 Polluted Leftovers: Repository Metrics from the ... · Web Service that Accurately Counts Item Downloads from Institutional Repositories,” Library Hi Tech, vol. 35, no. 1, March

MeasuringUpgrant:About

!  IMLS-fundedgrant,2014-2017!  “MeasuringUp:AssessingAccuracyofReportedUseandImpactofDigitalRepositories”!  hVp://scholarworks.montana.edu/xmlui/handle/1/8924!  Partners:MSU,UNM,OCLCResearch,ARL

!  Maindriver!  Improvingaccuracyandconsistencyofrepor[ng

!  Researchdemonstratesunderandover-coun[ngofIRuse!  Solu[on=RAMP(RepositoryAnaly[cs&MetricsPortal)

4/3/17Coali,onforNetworkedInforma,onmee,ng,Albuquerque,NM

2

Page 3: 1 Polluted Leftovers: Repository Metrics from the ... · Web Service that Accurately Counts Item Downloads from Institutional Repositories,” Library Hi Tech, vol. 35, no. 1, March

Overview

!  Problemsofestablishingconsistent,reliablemetricsofIns[tu[onalRepository(IR)usage

!  Currentresearchintosystemicover-coun[ngandunder-coun[ng

!  MappingIRlogdatatoGoogleAnaly[csandSearchConsoledataandcharacterizingthedifferences!  Assump[onsunderlyinganaly[csservicescanruleoutlegi[mateuser

interac[ons

4/3/17Coali,onforNetworkedInforma,onmee,ng,Albuquerque,NM

3

Page 4: 1 Polluted Leftovers: Repository Metrics from the ... · Web Service that Accurately Counts Item Downloads from Institutional Repositories,” Library Hi Tech, vol. 35, no. 1, March

ProblemsofIRReporting

!  Over-andUndercoun[ng

!  Varietyofanaly[csservicesandmethods!  Logfiles!  Soewareservices(pagetagging)

!  Generalconcerns!  Bots!  Varietyofconfigura[ons==maintenanceandcon[nuityques[onsfora

UL,consistencyandreliabilityacrossIRecosystemmorebroadly

4/3/17Coali,onforNetworkedInforma,onmee,ng,Albuquerque,NM

4

Page 5: 1 Polluted Leftovers: Repository Metrics from the ... · Web Service that Accurately Counts Item Downloads from Institutional Repositories,” Library Hi Tech, vol. 35, no. 1, March

DSpaceConfigurationOptions

Op,on DSpaceVersion DSpaceUICompa,bility

Integra,onwithEarlierStats

OtherNotes

GoogleAnaly[cs(GA)

GAUIsinceprev5,DSpacepluginsincev5.“Events”sincev5.

XMLUI,Mirage2themeonly.

NA DSpluginmustbeenabled

Elas[csearch(ES)

Sincev3,deprecatedinv6.

XMLUIonly. LegacySolrdatacanbeconverted.

Mustbeenabled.Defaultsetoffieldsisnotconfigurable.

Solr

Sincev1.6. XMLandJSPUI,somedifferenceinhowfaceteventsarelogged.

Legacysystemstatscanbeconverted.

SameasES,plusdownloads,workflow,andevents.

hVps://wiki.duraspace.org/display/DSDOC5x/Sta[s[cs+and+Metrics

4/3/17Coali,onforNetworkedInforma,onmee,ng,Albuquerque,NM

5

Page 6: 1 Polluted Leftovers: Repository Metrics from the ... · Web Service that Accurately Counts Item Downloads from Institutional Repositories,” Library Hi Tech, vol. 35, no. 1, March

ANewReportingModel

PageType Defini[on Examples

CitableContentDownloads

Non-HTMLscholarlycontentthatmaybeformallycitedintheresearchprocess

●  Publica[on(.pdf)●  Presenta[on(.ppt)●  DataSets(.csv)

ItemSummary HTMLpagestohelpuserdecidetodownloadthefullpublica[on

●  Title&Abstract●  ItemMetadata

AncillaryHTMLpagesthatprovidegeneralinforma[onornaviga[on

●  SearchResults●  BrowsebyAuthor●  Sta[s[cs

4/3/17Coali,onforNetworkedInforma,onmee,ng,Albuquerque,NM

6

Page 7: 1 Polluted Leftovers: Repository Metrics from the ... · Web Service that Accurately Counts Item Downloads from Institutional Repositories,” Library Hi Tech, vol. 35, no. 1, March

TwoClassesofWebAnalytics

HTML

Analytics Service (SaaS) 1Log Files

2Page Tagging {JavaScript}

4/3/17Coali,onforNetworkedInforma,onmee,ng,Albuquerque,NM

7

Page 8: 1 Polluted Leftovers: Repository Metrics from the ... · Web Service that Accurately Counts Item Downloads from Institutional Repositories,” Library Hi Tech, vol. 35, no. 1, March

Non-HTML

SearchAnaly[cs

Does!

Pagetaggingmethodsdonottacknon-HTMLCitableContentDownloads

4/3/17Coali,onforNetworkedInforma,onmee,ng,Albuquerque,NM

8

Page 9: 1 Polluted Leftovers: Repository Metrics from the ... · Web Service that Accurately Counts Item Downloads from Institutional Repositories,” Library Hi Tech, vol. 35, no. 1, March

GAAncillaryPVandItemSummaryPVvsCCD

134-dayperiodinSpring2016

4/3/17Coali,onforNetworkedInforma,onmee,ng,Albuquerque,NM

9

Page 10: 1 Polluted Leftovers: Repository Metrics from the ... · Web Service that Accurately Counts Item Downloads from Institutional Repositories,” Library Hi Tech, vol. 35, no. 1, March

GAHTMLversusGSCCCDtracking

PageTypeAnaly[cs SearchConsole

Pages Events SearchAnaly[cs

CitableContentDownloads - 26,355 562,933

ItemSummary 284,303 - -

Ancillary 201,793 - -

CCDTrackingImprovement

+2,000%4/3/17Coali,onforNetworkedInforma,onmee,ng,Albuquerque,NM

10

Page 11: 1 Polluted Leftovers: Repository Metrics from the ... · Web Service that Accurately Counts Item Downloads from Institutional Repositories,” Library Hi Tech, vol. 35, no. 1, March

MontanaMethodChallenges

!  Missingnon-GoogledirectlinkCCDevents!  Yahoo!  Bing!  Email!  Facebook&socialmedia

!  GSClimits[meandaccess!  Moving90-daywindow!  Granulardata=programmingskillstoaccessAPI

4/3/17Coali,onforNetworkedInforma,onmee,ng,Albuquerque,NM

11

Page 12: 1 Polluted Leftovers: Repository Metrics from the ... · Web Service that Accurately Counts Item Downloads from Institutional Repositories,” Library Hi Tech, vol. 35, no. 1, March

RAMP:RepositoryAnalytics&MetricsPortal

!  ramp.montana.edu

!  Abenchmarkingtool

!  Prototypeapplica[on!  Nolocalinstalla[onorconfigura[onrequired!  Integratedrepor[ngfromGoogleAPIs

4/3/17Coali,onforNetworkedInforma,onmee,ng,Albuquerque,NM

12

Page 13: 1 Polluted Leftovers: Repository Metrics from the ... · Web Service that Accurately Counts Item Downloads from Institutional Repositories,” Library Hi Tech, vol. 35, no. 1, March

RAMPIRasofMarch27,2017

•  8IRregistered•  Trackingover250,000digitalitems•  Capturingover20,000CCDperday

thatwerepreviouslyinvisiblethroughGA.

4/3/17Coali,onforNetworkedInforma,onmee,ng,Albuquerque,NM

13

Page 14: 1 Polluted Leftovers: Repository Metrics from the ... · Web Service that Accurately Counts Item Downloads from Institutional Repositories,” Library Hi Tech, vol. 35, no. 1, March

RAMP

4/3/17Coali,onforNetworkedInforma,onmee,ng,Albuquerque,NM

14

Page 15: 1 Polluted Leftovers: Repository Metrics from the ... · Web Service that Accurately Counts Item Downloads from Institutional Repositories,” Library Hi Tech, vol. 35, no. 1, March

RAMP

4/3/17Coali,onforNetworkedInforma,onmee,ng,Albuquerque,NM

15

Page 16: 1 Polluted Leftovers: Repository Metrics from the ... · Web Service that Accurately Counts Item Downloads from Institutional Repositories,” Library Hi Tech, vol. 35, no. 1, March

CitableContent

ClickThroughURL Country Device Posi,on Date

Impressions Clicks

No 0hVp://scholarworks.montana.edu/xmlui/handle/1/9348 hrv DESKTOP 31 3/8/17 1 0

Yes 0hVp://scholarworks.montana.edu/xmlui/bitstream/handle/1/8705/WhitenS0814.pdf;sequence=1 pan MOBILE 6 3/8/17 1 0

Yes 0hVp://scholarworks.montana.edu/xmlui/bitstream/handle/1/3670/31762001131281.pdf;sequence=1 fra DESKTOP 24 3/8/17 1 0

Yes 0hVp://scholarworks.montana.edu/xmlui/bitstream/handle/1/7215/31762101989810.pdf?sequence=1 chn DESKTOP 13 3/8/17 2 0

Yes 0

hVp://scholarworks.montana.edu/xmlui/bitstream/handle/1/11518/15-002_Surface-aVached_cells_biofilms_A1b.pdf?sequence=1 gbr DESKTOP 10 3/8/17 1 0

Yes 0hVp://scholarworks.montana.edu/xmlui/bitstream/1/1091/1/ColemanT1212.pdf kwt MOBILE 3 3/8/17 1 0

No 0hVp://scholarworks.montana.edu/xmlui/handle/1/9049 gbr DESKTOP 9 3/8/17 1 0

No 0hVp://scholarworks.montana.edu/xmlui/handle/1/2567 egy DESKTOP 44 3/8/17 1 0

Yes 0hVp://scholarworks.montana.edu/xmlui/bitstream/handle/1/7546/31762102468723.pdf;sequence=1 twn DESKTOP 14 3/8/17 1 0

No 0hVp://scholarworks.montana.edu/xmlui/handle/1/1854 tur DESKTOP 128 3/8/17 1 0

SampleRowsfromDataSet

4/3/17Coali,onforNetworkedInforma,onmee,ng,Albuquerque,NM

16

Page 17: 1 Polluted Leftovers: Repository Metrics from the ... · Web Service that Accurately Counts Item Downloads from Institutional Repositories,” Library Hi Tech, vol. 35, no. 1, March

PollutedLeftovers

!  HowmuchDSpaceSolrdataisnotalsocapturedbyGoogleAnaly[cs(GA)orGoogleSearchConsole(GSC)data?

!  Howmuchofthisac[vityishumanandhowmuchisbot?Whatkindsofhumanac[vityarenotbeingcapturedbythirdpartyservices?

!  ByfocusingonGA/GSCdataarewemissingasignificantpercentageofcitablecontentdownloads?Afrac[on?Afrac[onofafrac[on?

4/3/17Coali,onforNetworkedInforma,onmee,ng,Albuquerque,NM

17

Page 18: 1 Polluted Leftovers: Repository Metrics from the ... · Web Service that Accurately Counts Item Downloads from Institutional Repositories,” Library Hi Tech, vol. 35, no. 1, March

!LookingforStoriesintheData

4/3/17Coali,onforNetworkedInforma,onmee,ng,Albuquerque,NM

18

Page 19: 1 Polluted Leftovers: Repository Metrics from the ... · Web Service that Accurately Counts Item Downloads from Institutional Repositories,” Library Hi Tech, vol. 35, no. 1, March

ProcessingSolrLogs

!  Querystatscoreforcitablecontentdownloads(CCD)!  BundleName=ORIGINAL!  isBot=False!  Sta[s[csType=view!  Type=0

!  Querysearchcoreformetadata!  search.resourcetype=2(bitstreams)

!  MapCCDlogeventstometadatausingmetadata.search.resourceid=log.owningItem

!  ConsolidateloggedeventsperHandleperday

4/3/17Coali,onforNetworkedInforma,onmee,ng,Albuquerque,NM

19

Page 20: 1 Polluted Leftovers: Repository Metrics from the ... · Web Service that Accurately Counts Item Downloads from Institutional Repositories,” Library Hi Tech, vol. 35, no. 1, March

ProcessingGA/GSCData

!  HarvestdatadimensionsviaAPI!  Eventcategory!  Page!  UniqueEvents

!  FilterforCCD!  Pagehas“bitstream”inURL!  Clicks>0

!  AddacolumnforHandlesextractedfromURLs

!  JointoSolrdataoncombinedkeyofHandle_date

4/3/17Coali,onforNetworkedInforma,onmee,ng,Albuquerque,NM

20

Page 21: 1 Polluted Leftovers: Repository Metrics from the ... · Web Service that Accurately Counts Item Downloads from Institutional Repositories,” Library Hi Tech, vol. 35, no. 1, March

Overview

Table SolrCCD GoogleCCD

JoinedSolrsta[s[csandsearchcoredata

854,781

GA/GSCdatawithHandles 166,199

SolreventswithcorrespondingGA/GSCevents

356,557 166,199

SolreventswithoutcorrespondingGA/GSCevents

498,224

4/3/17Coali,onforNetworkedInforma,onmee,ng,Albuquerque,NM

21

Page 22: 1 Polluted Leftovers: Repository Metrics from the ... · Web Service that Accurately Counts Item Downloads from Institutional Repositories,” Library Hi Tech, vol. 35, no. 1, March

ACloserLookataMostDownloadedItem

4/3/17Coali,onforNetworkedInforma,onmee,ng,Albuquerque,NM

22

Page 23: 1 Polluted Leftovers: Repository Metrics from the ... · Web Service that Accurately Counts Item Downloads from Institutional Repositories,” Library Hi Tech, vol. 35, no. 1, March

Overview:12178

Table SolrCCD GoogleCCD

JoinedSolrsta[s[csandsearchcoredata

79,925

GA/GSCwithHandles 462

SolreventswithcorrespondingGA/GSCevents

74,612 462

SolreventswithoutcorrespondingGA/GSCevents

5313

4/3/17Coali,onforNetworkedInforma,onmee,ng,Albuquerque,NM

23

Page 24: 1 Polluted Leftovers: Repository Metrics from the ... · Web Service that Accurately Counts Item Downloads from Institutional Repositories,” Library Hi Tech, vol. 35, no. 1, March

ComparisonofLoggedEvents

GA/GSC

Solr

4/3/17Coali,onforNetworkedInforma,onmee,ng,Albuquerque,NM

24

Page 25: 1 Polluted Leftovers: Repository Metrics from the ... · Web Service that Accurately Counts Item Downloads from Institutional Repositories,” Library Hi Tech, vol. 35, no. 1, March

JanuaryLoggedEvents

GA/GSC

Solr

4/3/17Coali,onforNetworkedInforma,onmee,ng,Albuquerque,NM

25

Page 26: 1 Polluted Leftovers: Repository Metrics from the ... · Web Service that Accurately Counts Item Downloads from Institutional Repositories,” Library Hi Tech, vol. 35, no. 1, March

JanuarySolrUniqueIPandUniqueLat/Long

4/3/17Coali,onforNetworkedInforma,onmee,ng,Albuquerque,NM

26

Page 27: 1 Polluted Leftovers: Repository Metrics from the ... · Web Service that Accurately Counts Item Downloads from Institutional Repositories,” Library Hi Tech, vol. 35, no. 1, March

MarchLoggedEvents

GA/GSC

Solr

4/3/17Coali,onforNetworkedInforma,onmee,ng,Albuquerque,NM

27

Page 28: 1 Polluted Leftovers: Repository Metrics from the ... · Web Service that Accurately Counts Item Downloads from Institutional Repositories,” Library Hi Tech, vol. 35, no. 1, March

MarchSolrUniqueIP&UniqueLat/Long

4/3/17Coali,onforNetworkedInforma,onmee,ng,Albuquerque,NM

28

Page 29: 1 Polluted Leftovers: Repository Metrics from the ... · Web Service that Accurately Counts Item Downloads from Institutional Repositories,” Library Hi Tech, vol. 35, no. 1, March

Considerations

!  Humanvs.BotBehavior!  Whataboutorganiza[ons?

!  ReturningUsers&BookmarkedContent!  Visitsanddevices!  Difficulttolumpeventsbydaybecauseeventsmaybeseparatedby

hours

!  Pageviewsvs.Downloads

!  Accessvs.Use!  Whenisaccessnotuse?

4/3/17Coali,onforNetworkedInforma,onmee,ng,Albuquerque,NM

29

Page 30: 1 Polluted Leftovers: Repository Metrics from the ... · Web Service that Accurately Counts Item Downloads from Institutional Repositories,” Library Hi Tech, vol. 35, no. 1, March

Closing

!  Varia[onamongIRplayormsandconfigura[onscomplicatesbenchmarkinginternallyandacrossins[tu[ons

!  Logdataover-countusage!  GoogledoesamuchbeVerjoboffilteringbots

!  Pagetaggingservicesundercountusage!  Itemlevelanalysissuggeststhatundercoun[ngresultsfromassump[ons

whichruleoutlegi[mateuserinterac[ons

4/3/17Coali,onforNetworkedInforma,onmee,ng,Albuquerque,NM

30

Page 31: 1 Polluted Leftovers: Repository Metrics from the ... · Web Service that Accurately Counts Item Downloads from Institutional Repositories,” Library Hi Tech, vol. 35, no. 1, March

Publications

Published: Patrick OBrien, Kenning Arlitsch, Jeff Mixter, Jonathan Wheeler, Leila Sterman. “RAMP: Repository Analytics and Metrics Portal: A Prototype Web Service that Accurately Counts Item Downloads from Institutional Repositories,” Library Hi Tech, vol. 35, no. 1, March 2017 Patrick OBrien, Kenning Arlitsch, Leila Sterman, Jeff Mixter, Jonathan Wheeler, and Susan Borda. “Undercounting File Downloads from Institutional Repositories,” Journal of Library Administration, vol. 56, no. 7, 2016 Proposal funded by IMLS: ”Measuring Up: Assessing Accuracy of Reported Use and Impact of Digital Repositories” - scholarworks.montana.edu/xmlui/handle/1/8924

4/3/17Coali,onforNetworkedInforma,onmee,ng,Albuquerque,NM

31

Page 32: 1 Polluted Leftovers: Repository Metrics from the ... · Web Service that Accurately Counts Item Downloads from Institutional Repositories,” Library Hi Tech, vol. 35, no. 1, March

!Thankyou!

JonathanWheeler,DataCura[onLibrarian,[email protected],DeanoftheLibrary,[email protected]@kenning_msu

4/3/17Coali,onforNetworkedInforma,onmee,ng,Albuquerque,NM

32