29
THE SCIENCE THE SEARCH THE SOLUTION www.cabi-publishing.org DOIs and the Secondary Publisher; a match made in heaven? Andrea Powell Product Development Director CABI Publishing

THE SCIENCETHE SEARCHTHE SOLUTION DOIs and the Secondary Publisher; a match made in heaven? Andrea Powell Product Development Director

Embed Size (px)

Citation preview

THE SCIENCE THE SEARCH THE SOLUTIONwww.cabi-publishing.org

DOIs and the Secondary Publisher; a match made in heaven?

Andrea Powell

Product Development Director

CABI Publishing

THE SCIENCE THE SEARCH THE SOLUTIONwww.cabi-publishing.org

It is a truth universally acknowledged….

….that a secondary database in possession of millions of bibliographic references, must be in want of a linking solution

(with apologies to Jane Austen)

THE SCIENCE THE SEARCH THE SOLUTIONwww.cabi-publishing.org

A bit about CABI Publishing

• First publication in 1912• Applied life sciences publisher• Database products at the heart of our

publishing business (CAB Abstracts and Global Health)

• Primary journals and books now account for 30% of turnover

• Total turnover approx. £12 million

THE SCIENCE THE SEARCH THE SOLUTIONwww.cabi-publishing.org

Some facts and figures• CAB Abstracts (1973-2004) contains 4.5

million bibliographic references• Our Archive (1912-1972) adds a further 2.2

million references• Our acquisitions database lists 9000 active

publishers from whom we receive content• We receive about 7500 serials in any one

year, from over 125 countries in over 50 languages

THE SCIENCE THE SEARCH THE SOLUTIONwww.cabi-publishing.org

Oh, and not forgetting...

… we also cover books, conference proceedings, technical bulletins, “grey literature”, websites, annual reports, theses…… (approx. 18% of total)

THE SCIENCE THE SEARCH THE SOLUTIONwww.cabi-publishing.org

So what do we do?

• Create a consistently indexed, standardised, searchable database to enable the discovery of this rich content

• And then link the user to the full-text as seamlessly as possible

THE SCIENCE THE SEARCH THE SOLUTIONwww.cabi-publishing.org

THE SCIENCE THE SEARCH THE SOLUTIONwww.cabi-publishing.org

THE SCIENCE THE SEARCH THE SOLUTIONwww.cabi-publishing.org

THE SCIENCE THE SEARCH THE SOLUTIONwww.cabi-publishing.org

DOIs and CrossRef - a heaven-sent solution?

• Universal, multi-publisher protocol

• Cost-effective, although concerns at the beginning about escalation of look-up fee costs

• Hurray!

THE SCIENCE THE SEARCH THE SOLUTIONwww.cabi-publishing.org

Adding DOIs to the database

• Creation of new field within Production Database

• Development and implementation of new workflows to collect DOIs at most appropriate stage of our process

• Matching our serials list against the CrossRef Metadata Database

THE SCIENCE THE SEARCH THE SOLUTIONwww.cabi-publishing.org

Looking up DOIs - the early days

• In early 2002, we were able to achieve 4% matching rates (ought to have been 18%)

• Reasons for poor match rate:- timing of deposits- poor quality data- rigid matching algorithm- mis-match between our records and retrieved metadata

THE SCIENCE THE SEARCH THE SOLUTIONwww.cabi-publishing.org

Our DOI look-up and implementation process

• Two methods:- weekly look-up- twice-yearly batch look-up

THE SCIENCE THE SEARCH THE SOLUTIONwww.cabi-publishing.org

Weekly look-up

• Automated system built into our weekly mechanism for transferring records from our production system to our live database

• Manual option to re-run this stage is also available if necessary

• Records with no DOI value but with ISSN selected and extracted into a processing list

THE SCIENCE THE SEARCH THE SOLUTIONwww.cabi-publishing.org

Weekly look-up

• Each field is processed to replace CABI-specific formatting with URL-safe coding

• Single query string constructed from the data from 50 records

• Each new query added to the string, separated from neighbour using URL-safe line feed “%0A”

• Approximately 3800 look-ups per week

THE SCIENCE THE SEARCH THE SOLUTIONwww.cabi-publishing.org

Weekly look-up

• We use a piped format:SN|DO|AU|VL|NO|PP|YR||PA*(*PA is our unique identifier)

• Query string sent to CrossRef via web link:"http://doi.crossref.org/servlet/query?&usr=cabi&pwd=crpw1683&type=q&area=live&fuzzy=true&format=piped&qdata=SN|DO|AU|VL|NO|PP|YR||PA|%0A SN|DO|AU|VL|NO|PP|YR||PA|”

THE SCIENCE THE SEARCH THE SOLUTIONwww.cabi-publishing.org

DOI Assignment

• Web feed returned and converted into text file, which is processed to extract out individual queries

• Each query then processed to recover the PAN (unique ID) and DOI data

• PAN matched back to our database and DOI data embedded in record

• BINGO!

THE SCIENCE THE SEARCH THE SOLUTIONwww.cabi-publishing.org

Twice-yearly batch look-up• Entirely manual process, using text files and e-

mails• Look-up process much the same, but date range

added to selection process• Piped query strings output in batches of 1000 and

prefixed with a CABI e-mail address • Each file of 1000 queries uploaded via CrossRef

website• Results returned via e-mail and processed to

extract PAN and DOI

THE SCIENCE THE SEARCH THE SOLUTIONwww.cabi-publishing.org

Looking up DOIs - these days

• Now consistently achieving 25-30% matching rate

• Backfile look-ups are even better, at 40%

• But how frequently should we add DOIs to our backfile - is twice a year enough?

• Not yet querying for Books or Conferences, but plan to soon

THE SCIENCE THE SEARCH THE SOLUTIONwww.cabi-publishing.org

Getting DOIs to the customer

• A&I databases are typically delivered via a number of third parties, e.g. Ovid, ISI, EBSCO, Dialog….

• It’s taken until late 2004 for some vendors to implement DOIs in our database

• Not all vendors use DOIs for linkage, preferring their proprietary systems

THE SCIENCE THE SEARCH THE SOLUTIONwww.cabi-publishing.org

THE SCIENCE THE SEARCH THE SOLUTIONwww.cabi-publishing.org

Other ways of linking to full-text

• 40% is good, but that still leaves a lot of unmatched references!

• User demand is for more and more full-text linkage - “good enough” generation won’t pursue non-linking items

• Customers can tailor their own links with Link Resolvers

• CAB Direct provides a default linking solution for subscribers without a Link Resolver

THE SCIENCE THE SEARCH THE SOLUTIONwww.cabi-publishing.org

THE SCIENCE THE SEARCH THE SOLUTIONwww.cabi-publishing.org

THE SCIENCE THE SEARCH THE SOLUTIONwww.cabi-publishing.org

THE SCIENCE THE SEARCH THE SOLUTIONwww.cabi-publishing.org

THE SCIENCE THE SEARCH THE SOLUTIONwww.cabi-publishing.org

Digital Archives

• Many primary and secondary publishers now digitising their archives

• CAB Abstracts archive adds 2.2 million references, back to 1912

• Full-text linking more difficult with incomplete references, no ISSNs (pre-database era), lack of digital originals

• Issue of timing again writ large!

THE SCIENCE THE SEARCH THE SOLUTIONwww.cabi-publishing.org

The bigger picture• Researchers still use secondary databases

heavily in their resource discovery processes

• The amount of material to be indexed increases year by year

• Secondary databases have to keep pace with changes in scholarly communication

• We must put our content where the users are, not the other way round

THE SCIENCE THE SEARCH THE SOLUTIONwww.cabi-publishing.org

Thank you

Andrea Powell

[email protected]