Upload
branden-bond
View
217
Download
0
Tags:
Embed Size (px)
Citation preview
The Cornell VeterinarianA Metadata Perspective
The Challenge (Reprise)
Hathi Volume Interface
Hathi Data API
Hathi METS File
Hathi METS File (Continued)
Hathifile Record ElementsHathi Volume ID: mdp.39015076694507
Access: allow [Notes on mapping for rights attributes where contextual user data would affect access]
Rights: pd [public domain]
HathiTrust record number: 000529434
Enumeration/Chronology: v.33 no.11 1900
Source: MIU
Source institution record number: 000529434
OCLC number: 1554176
Title: The Chicago medical times.
What I [naively] thought was the solution…
1. Use the Hathi Data API to find Table of Contents for each Volume
2. Gather the related OCR
3. Parse out the article citation values from the OCR (hopefully in a mostly automated way)
4. Use the pagination data from the TOC to build links
5. What could be automated could be done manually
Goal: a citation index with Hathi URLs that could be used to build an interface or given to an index like PubMED
HathiTrust OCR for TOC
PubMed Indexing and API
Path for automation(For citations in PubMed for which the HathiTrust has a single volume)
Query: PubMed Volume AND Hathi Catalog ID against Hathi File to get all corresponding object id’s from the METS.
Query: METS object id’s AND the PubMed start page for each citation to find the Orderlabel to get the Order number from METS files.
Create each URL: The Hathi METS object id and Order number are used to create the URL, e.g http://babel.hathitrust.org/cgi/pt?id=coo.31924051143075;view=1up;seq=11
The Metadata that Got Away…
Articles not indexed by PubMed (1991-1914) Supplemental volumes
What we hope to do about it: Still working to see if we can programmatically create URL’s
for Supplemental Volumes Manually capture citation data and URL’s for pre-1945
articles using OCR.
PubMed Data Requirements
Linking Format (when we’re only contributing URL’s) PubMed Id’s and corresponding URL’s Administrative metadata, e.g. access restrictions, contributing
source.
Required data elements for contributing citations Journal ISSN Journal ID or Journal title abbreviation Journal Publisher Copyright statement, where applicable Volume/Issue/Article sequence or pagination Issue publication date Article electronic publication date? AND URL’s
What does it all mean?
For the project: The Cornell Veterinarian should be available via PubMed for
the years already indexed soon.
We’re still scoping out what it would take to capture the remaining citations manually. If funded this will be sent to PubMed to complete the backfile.
Larger picture: Potential for improved access to other titles currently lacking
full-text linking in PubMed [if in HathiTrust]
Consider suggesting improvements to the Hathi workflows.