16
Shortstop to First to Third: Collaborating to Access Digital Collections for Biomedical Natural Language Processing (BNLP) Research Lynne M. Fox & Leslie Williams, Health Sciences Library Dr. Larry Hunter & Christophe Roeder, Center for Computational Pharmacology, School of Medicine University of Colorado Anschutz Medical Campus “Forget goals. Value the process.” Jim Bouton (Author of Ball Four)

Shortstop to First to Third

Embed Size (px)

DESCRIPTION

This case study examines three attempts to leverage the library’s journal licenses to obtain a large collection of content for research purposes.

Citation preview

Shortstop to First to Third: Collaborating to Access Digital Collections for Biomedical Natural

Language Processing (BNLP) Research

Lynne M. Fox & Leslie Williams, Health Sciences Library

Dr. Larry Hunter & Christophe Roeder,Center for Computational Pharmacology, School of Medicine

University of Colorado Anschutz Medical Campus

“Forget goals. Value the process.” Jim Bouton (Author of Ball Four)

Presentation Overview

• The Teams• The Goal• The Game Plans• The Players• The Pitch• Instant Replay google.com

The Teams

• University of Colorado Anschutz Medical Campus– Center for Computational

Pharmacology– Health Sciences Library

• Content Producers & Holders– PubMed Central– Major STM Vendor google.com

Center forComputational Pharmacology

Bio-Medical Natural Language Processing (BNLP) uses techniques from computer science, artificial intelligence (also called machine learning), biology and linguistics to extract meaningful information from natural language (in this case bio-medical language).

http://hanalyzer.sourceforge.net/

Center for Computational Pharmacology (CCP)

The GoalTo obtain a digital collection of full-text

biomedical journal articles, in XML format, for the Center’s BNLP research

NIH

Four Game Plans

– Secure open access content from BMC & PLoS

– Download content from PubMedCentral

– Leverage researcher networks to obtain content from major STM vendors

– License content from Major STM vendor

google.com

The Players

From University of Colorado:• Dr. Larry Hunter, Principle

Investigator• Chris Roeder, Programmer• Helen Johnson, Computational

Linguist• Leslie Williams, Acquisitions

Librarian• Lynne Fox, Education

Librarian and Center Librarian• Annalissa Philbin, University

Counsel

From Major STEM journal vendor:• Senior Vice- President of

Academic Affairs• Vice-President for Science &

Technology Strategy • Strategy Analyst• Senior Account Manager• Associate Account Manager• Corporate Counsel

The Pitch

• Price• License Agreement• Dataset and Delivery

google.com

Price Structure

• New product, new model– Annual subscription– Per Article Price– Volume Discount– Subscribed Content Included– Library’s Continued Subscription

to Vendor’s Product Required• Grant budgets

– Span multiple fiscal years– Payment process requires

additional levels of approvals

google.com

Key Elements of License

– Definitions• Users

– Subscription• How Dataset Can Be Used

– Obligations• Agreement Contingent Upon

– Use of Names• How Each Party May or May

Not Use the Other’s Name– Other

• Financial, Term, Etc.

google.com

Dataset & Delivery

• Test, Test, Test• Refine Definitions

– What is an article?– What format will the

article be delivered?• Refine Delivery and

Tracking Mechanisms google.com

Instant Replay: What did we learn so we improve the process for next time?

• Communication is key• Get the right people involved• Understand the rights and limitations of the

library’s existing licenses• Research domain finance is different • Be clear about rights to collections and

discoveries• Be clear about obligations on both sides

There’s always next year . . .

An expected grant wasn’t received, so signing the license is on hold until additional funds can be found.

If we continue to secure content: for better efficiency and to ensure expanded digital collections content, we’d like to work with vendors during renewal negotiations for regular content licensing to include additional xml access rights as part of the license agreement.

ReferencesHoekman, Anne. “Journal Publishing Technologies: XML.” Accessed May 14, 2012. URL: https://www.msu.edu/~hoekmana/WRA%20420/ISMTE%20article.pdf

Howard J. Technology: Major STEM journal vendor Experiments With Allowing 'Text Mining' of Its Journals - Technology - The Chronicle of Higher Education. The Chronicle of Higher Education. May 6, 2012. Accessed May 8, 2012. URL: http://chronicle.com/article/Hot-Type-Major STEM journal vendor-Experiments/131789/?sid=wc&utm_source=wc&utm_medium=en

“Natural language processing.” Wikipedia. Accessed April 27, 2012. URL: http://en.wikipedia.org/wiki/Natural_language_processing#cite_note-1

SM Leach, H Tipney, W Feng, WA Baumgartner Jr, P Kasliwal, RP Schuyler, T Williams, RA Spritz, and L Hunter. “Biomedical Discovery Acceleration, with Applications to Craniofacial Development.” PLoS Comput Bio 2009, 5(3):e1000215. doi:10.1371/journal.pcbi.1000215. PMID: 19325874

University of Colorado, School of Medicine, Center for Computational Pharmacology. “Hanalyzer: A 3R system for genome-scale discovery: Let knowledge drive your data exploration.” Accessed April 27, 2012. URL: http://hanalyzer.sourceforge.net/

Thank you!

Contact Info:

Lynne M. Fox, [email protected]

Leslie [email protected]