22
The emerging infrastructure of scholarly communication Steve Hitchcock The Open Citation Project (OpCit), Southampton University These slides prepared for ALPSP International Learned Journals Seminar We can't go on like this: the future of journals, London, 12 April 2002 OpCit is a joint JISC-NSF International Digital Libraries Project 1999-2002

The emerging infrastructure of scholarly communication Steve Hitchcock The Open Citation Project (OpCit), Southampton University These slides prepared

Embed Size (px)

Citation preview

Page 1: The emerging infrastructure of scholarly communication Steve Hitchcock The Open Citation Project (OpCit), Southampton University These slides prepared

The emerging infrastructure of scholarly

communication

Steve HitchcockThe Open Citation Project (OpCit), Southampton University

These slides prepared for ALPSP International Learned Journals Seminar

We can't go on like this: the future of journals, London, 12 April 2002

OpCit is a joint JISC-NSF

International Digital Libraries Project 1999-2002

Page 2: The emerging infrastructure of scholarly communication Steve Hitchcock The Open Citation Project (OpCit), Southampton University These slides prepared

Emerging infrastructure: the hypothesis …

Scholarly electronic information will be ‘seamless’ and ‘integrated’

Page 3: The emerging infrastructure of scholarly communication Steve Hitchcock The Open Citation Project (OpCit), Southampton University These slides prepared

Background to the talk

The brief: Seizing control: Self-archiving or Open archiving

… cover the way in which alternative means of publication, possibly bypassing publishers, can be made as valuable (i.e. interoperable) as possible through the use of the Open Archive standard … broader reflections on the role of preprint and 'postprint' archives would also be highly relevant!

Sally Morris responding to the Budapest Open Access Initiative "open archiving means you don't have to go to the journal and we believe it could very rapidly undermine the journals without putting anything in their place,“ (italics added) BBCi News, 25 March 2002 http://news.bbc.co.uk/hi/english/sci/tech/newsid_1885000/1885931.stm

Page 4: The emerging infrastructure of scholarly communication Steve Hitchcock The Open Citation Project (OpCit), Southampton University These slides prepared

What is “seamless integration”?

From any given document the user might expect to be able to retrieve any related document within one mouse click.

Typically what is related is defined, and linked, by the author or publisher or other service provider, and is constrained by the tools and information services at their disposal.

Longer term the relation may be anything the user might consider to be related.

Page 5: The emerging infrastructure of scholarly communication Steve Hitchcock The Open Citation Project (OpCit), Southampton University These slides prepared

Achieving seamless integration – Web services

Emerging Web services standards are motivated by the need to connect business processes, especially databases, across the Web. The basic platform for Web services is XML plus HTTP, maintaining the ubiquity and simplicity of the Web. Web services are based on three mechanisms:

• to register a service (e.g. Web Service Definition Language, WSDL)

• to find a service (e.g. a registry such as Universal Description, Discovery, and Integration, UDDI)

• to communicate (e.g. Simple Object Access Protocol, SOAP)

http://www.w3.org/2002/ws/

Digital library architectures are evolving to include Web services-like components, and may ultimately migrate to these emerging standards

Page 6: The emerging infrastructure of scholarly communication Steve Hitchcock The Open Citation Project (OpCit), Southampton University These slides prepared

Is seamless integration possible for the refereed journal literature?

For scholarly research papers this prospect raises two subsidiary questions about the ‘seamlessly integrated’ literature:

• Will it be complete (from the viewpoint of every user)?

• Will it be free (or appear to be free)? A work may appear to be free to the user when it is accessed via a library, for example.

The refereed journal literature will need to be complete for everyone, everywhere, if seamless integration, even on a modest scale, is to be achieved.

Page 7: The emerging infrastructure of scholarly communication Steve Hitchcock The Open Citation Project (OpCit), Southampton University These slides prepared

Towards seamless integration: publishers and libraries

• Site licenses for electronic journals

• Alternative journals, e.g. Scholarly Publishing & Academic Resources Coalition (SPARC), increasing competition in the journal market, facilitating partnerships with publishers and other journal producers

• Linking– CrossRef

– OpenURL, to link users to these subscription and document services, recognising this vast new array of electronic content would need to be accessible and navigable by users within the library’s information environment

• Open Archives Initiative, interoperability standards to facilitate the efficient dissemination of content

Page 8: The emerging infrastructure of scholarly communication Steve Hitchcock The Open Citation Project (OpCit), Southampton University These slides prepared

Making appropriate connectionsLibrary users may have authority to access a paper via one library subscription or another, directly from the publisher or via an aggregator or other agency. This has become know as the ‘appropriate copy’ problem.

OpenURL is a generalized framework for communicating and resolving links and supports software solutions to the appropriate copy problem. OpenURL is described as an ‘interoperability specification’.

Page 9: The emerging infrastructure of scholarly communication Steve Hitchcock The Open Citation Project (OpCit), Southampton University These slides prepared

Syntax of OpenURL http://(who you are, where you are, your institution)/(where you want to go) A B C

(A) An OpenURL is mediated by the HTTP protocol

(B) BASEURL, data about the user, typically inserted during transport between servers. One interim mechanism is to store the BASEURL as a cookie in the user’s browser. The cookie identifies the resolver that provides context-sensitive services for the user.

(C) QUERY, points to the referenced object, which might be an identifier, e.g.

– Digital Object Identifier (DOI)

– Metadata derived from an authored reference

– Partial metadata - a secondary service identifies the required document

OpenURL has been proposed as a National Information Standards Organization (NISO) standard http://library.caltech.edu/openurl/

Page 10: The emerging infrastructure of scholarly communication Steve Hitchcock The Open Citation Project (OpCit), Southampton University These slides prepared

Example OpenURL architecture

OpenURLs might be based on CrossRef–DOI services

(from Beit-Arie et al., 2001, D-Lib Magazine, September) http://www.dlib.org/dlib/september01/caplan/09caplan.html

Page 11: The emerging infrastructure of scholarly communication Steve Hitchcock The Open Citation Project (OpCit), Southampton University These slides prepared

The Open Archives Initiative (OAI)

The OAI (http://www.openarchives.org/) defines

• A Metadata Harvesting Protocol (MHP), an application-independent interoperability framework that can be used by a variety of communities engaged in publishing content on the Web

• Two classes of participants

– Data providers expose metadata about content

– Service providers issue protocol requests to data providers

c.f. Web services: register, find, communicate, mediated by XML and HTTP

OAI is a very simple, low-barrier-to-entry interface, shifting implementation complexity and operational processing load away from the data repositories to the developers of federated search services, repository redistribution services, etc.

Page 12: The emerging infrastructure of scholarly communication Steve Hitchcock The Open Citation Project (OpCit), Southampton University These slides prepared

Creating information interfaces: portals

We have to manage the underlying complexity in the form of interfaces. Portals have become important interfaces in the scholarly environment. Portal strategies

• by publishers (e.g. Elsevier’s ScienceDirect)

• by associated networked information services (e.g. Ingenta),

• by library resource discovery networks (e.g. JISC’s RDN)

have yet to establish a pre-eminent model. This is because all have concentrated on content, mostly owned content. The best next-generation portals will build services on top of content, and for researchers will become the starting point for all lines of enquiry.

Page 13: The emerging infrastructure of scholarly communication Steve Hitchcock The Open Citation Project (OpCit), Southampton University These slides prepared

Access and interfaces: implications for journals

Digital information, rich in media and resources, formal and informal, mediated by multiple services, presents the user with an array of choices that might answer his or her queries most efficiently.

Those queries might be expressed as input to a search engine, or by selecting a link. Where might these citations come from? Personal emails, discussion lists, open access services such as OAI, eprint archives, newsletters, library services, Z-gateways and academic subject portals, as well as formal research papers and commercial indexing services. There will be many more.

The journal package has traditionally been bound in issues and volumes. With the advent of multiple networked sources mediated by services such as OpenURL, the binding has been unstitched.

Page 14: The emerging infrastructure of scholarly communication Steve Hitchcock The Open Citation Project (OpCit), Southampton University These slides prepared

What are digital journals for?

Digital journals will be scaled back to the single essential function of quality control, in the form of managed peer review

Access to journal contents will be mediated by multiple interfaces - open access services, portals and information interfaces, other than just the journal.

Journals cannot remain the exclusive provider of peer-reviewed papers

Page 15: The emerging infrastructure of scholarly communication Steve Hitchcock The Open Citation Project (OpCit), Southampton University These slides prepared

A post-Google information environment

Electronic journals exist in a post-Gutenberg and a post-Google information environment

By March 2001 the Internet Archive had stored 10 billion Web pages (100 terabytes of data)

The ability to locate a specified item of information precisely and instantly among the mass of information available on the Web has profound implications. In the electronic environment the search engine has become the de facto interface to information, rather than the fragmented packages that have migrated from the print world.

Page 16: The emerging infrastructure of scholarly communication Steve Hitchcock The Open Citation Project (OpCit), Southampton University These slides prepared

A maximising strategy for authors

Results from the Open Citation Project show that authors who self-archive their papers in OAI-compliant institutional or discipline-based eprint archives will:

• Maximise interfaces to their work

• Maximise access to their work

• Maximise impact of their work

Page 17: The emerging infrastructure of scholarly communication Steve Hitchcock The Open Citation Project (OpCit), Southampton University These slides prepared

Maximising access: arXiv example

Decreasing citation latencies: The latency of the citation peak has been reducing over the period of the archive, i.e. each year papers are cited sooner and more often

Mining the Social Life of an Eprint Archive http://opcit.eprints.org/tdb198/opcit/

Page 18: The emerging infrastructure of scholarly communication Steve Hitchcock The Open Citation Project (OpCit), Southampton University These slides prepared

Maximising impact: arXiv example

More highly cited papers show higher and more sustained download frequencies

Mining the Social Life of an Eprint Archive http://opcit.eprints.org/tdb198/opcit/

Page 19: The emerging infrastructure of scholarly communication Steve Hitchcock The Open Citation Project (OpCit), Southampton University These slides prepared

Maximising interfacesMeasuring arXiv access and impact data: the Open Citation project has mined:

• Usage data from selected arXiv mirror server logs

• Reference lists from 155,000+ arXiv papers to build CiteBase, an open citation database

•CiteBase, a new interface to the scholarly literature http://citebase.eprints.org

Page 20: The emerging infrastructure of scholarly communication Steve Hitchcock The Open Citation Project (OpCit), Southampton University These slides prepared

On the significance of new journal metrics

“I believe publishers such as Elsevier are now focusing on economic prospects that can be derived from archiving, evaluating, and intelligence gathering from real-time usage of a significant fraction of the world literature. It is not difficult to see the intelligence and prospective potential of some scientometric tools being developed. Obviously, Elsevier (and probably other publishers as well) are looking in this direction.”

Jean-Claude Guédon, September98-Forum listserver, 2 April 2002

Guédon is the author of

In Oldenburg’s Long Shadow: Librarians, Research Scientists, Publishers, and the Control of Scientific Publishing (ARL Press)

http://www.arl.org/arl/proceedings/138/guedon.html

Page 21: The emerging infrastructure of scholarly communication Steve Hitchcock The Open Citation Project (OpCit), Southampton University These slides prepared

“A dynamic digital archive”

Scientists and researchers, Nobel Laureates among them, have produced the clearest declaration of their requirement for access to published research papers – a comprehensive collection that can be efficiently indexed, searched, and linked:

“Unimpeded access to these archives and open distribution of their contents will enable researchers to take on the challenge of integrating and interconnecting the fantastically rich, but extremely fragmented and chaotic, scientific literature.”

Roberts et al. (2001) Science, 23rd March, 2001 http://www.sciencemag.org/cgi/content/full/291/5512/2318a

Page 22: The emerging infrastructure of scholarly communication Steve Hitchcock The Open Citation Project (OpCit), Southampton University These slides prepared

Credits

The Open Citation project is a collaboration between Southampton University, Cornell University and arXiv

• The project leaders are Stevan Harnad and Carl Lagoze

• Technical development at Southampton is directed by Les Carr

• EPrints.org software is being developed by Chris Gutteridge

• CiteBase is produced and managed by Tim Brody

A copy of these slides can be found on the OpCit Web site

http://opcit.eprints.org/. Look for Papers and Presentations

Contact Steve Hitchcock: [email protected]