Upload
jennifer-bowen
View
117
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Presented at the Northern Ohio Technical Services Librarians' meeting, November 22, 2013. Describes why libraries should move toward a linked data future to enable their resources to be discoverable on the open web, and includes lessons learned from developing the eXtensible Catalog at the University of Rochester.
Citation preview
LINKED DATA: WHY BOTHER?
JENNIFER BOWEN, UNIVERSITY OF ROCHESTERNOTSL MEETING, KENT STATE UNIVERSITY
NOVEMBER 22, 2013
2
My Topics Today
The “Vision” piece: Why should libraries care about linked data?
A few linked data use cases for libraries Can libraries achieve their metadata-
related goals WITHOUT linked data? Lessons learned from developing the
eXtensible Catalog and what that has to do with linked data
eXtensibleCatalog.org
XC User Research Partners:
Cornell UniversityOhio State University
University of RochesterYale University
Studying scholars at the UR…
Scholars want to read everything
on the topic that they are researching
They want to be in the middle of everything they need, all
organized so it is findable and usable
Scholars want their research to be findable and usable by others.
8
“These other researchers cite MY research…”
Scholars want to connect to people whose work is interesting and useful to
them.
Scholars don’t care what the technology is,
as long as it helps them do their work
11
A shift in how people seek and use information
Systems that libraries provide (websites, catalogs, databases) are bypassed
…not just in favor of Google and the Web in general
…but also in favor of tailored desktop, mobile, and web applications
12
Beyond library finding tools
“Even scholars who continue to use library finding tools are turning to new applications to aggregate and analyze information in ways that extend their scholarship beyond what manual searching and analyzing allows.” -- Nancy Fried Foster
Senior Anthropologist, Ithaka S+R
13
Vision for how to address this…
Make library resources discoverable on the open web, through applications that potential readers are already using:
Search enginesMobile appsSocial media
AN EXAMPLE…
An example…Mt. Hope Cemetery
Photo credits: ROCHESTER’S SPEAKING STONES By Th. Emil Homerin; University of Rochester Department of Religion and Classics http://www.rochester.edu/College/REL/faculty/homerin/REL167/reports.htm
16
An example…Mt. Hope Cemetery
Photo credit: www.findagrav.com/cgi-bin/fg.cgi?page=pv&GRid=31&PIpi=76016
17
Photo credits: University of Rochester. River Campus Libraries. Department of Rare Books and Special Collections. http://www.lib.rochester.edu/index.cfm?PAGE=4119
18
What’s the role of linked data?
Tools like this are possible today with dedicated programming.
Linked Data will enable library resources to be included in applications like this by allowing application developers access to a “…a store of machine-actionable data on which improved services can be built”. (Linked Open Data value statement)
THREE INITIATIVES RELATED TO LINKED DATA AND LIBRARIES
20
Stanford Linked Data Workshop (2011)
http://www.clir.org/pubs/reports/pub152/LinkedDataWorkshop.pdf
Linked Open Data Value Statements
21
Linked Open Data Value Statements
Linked Open Data (LOD) puts information where people are looking for it: on the web
LOD can expand discoverability of our content
LOD opens opportunities for creative innovation in digital scholarship and participation
LOD allows for open continuous improvement of data
LOD creates a store of machine-actionable data on which improved services can be built
http://www.clir.org/pubs/reports/pub152/LinkedDataWorkshop.pdf
22
More Linked Open Data Value Statements
Library LOD might facilitate the breakdown of the tyranny of domain silos
LOD can provide direct access to data in ways that are not currently possible, and provides unanticipated benefits that will emerge later as the stores of LOD expand
http://www.clir.org/pubs/reports/pub152/LinkedDataWorkshop.pdf
23
Another library linked data initiative: BIBFRAME
www.loc.gov/bibframe/
24
What is BIBFRAME?
Library of Congress-led effort to replace MARC 21 with a new bibliographic model based upon linked data
“Determine a transition path for the MARC 21 exchange format in order to reap the benefits of newer technology while preserving a robust data exchange that has supported resource sharing and cataloging cost savings in recent decades.”
25
More goals of LC’s BIBFRAME Differentiate between conceptual
content and physical manifestations (works and instances)
Focus on unambiguously identifying information entities (e.g. authorities)
Leverage and expose relationships between and among entities
http://bibframe.org/
26
Potential Issues with BIBFRAME Conceptual model doesn’t fully conform
to either FRBR or RDA (e.g. no “expression” level) – is this a problem?
Will organizations that have already implemented linked data use BIBFRAME once it is finished?
Do we really need a new serialization of MARC dictated by LC?
http://bibframe.org/
WHAT CAN LIBRARIES ACTUALLY DO WITH LINKED DATA?
Let’s get a little more specific…
28
“The mission of the Library Linked Data incubator group is to help increase global interoperability of library data on the Web, by bringing together people involved in Semantic Web activities—focusing on Linked Data—in the library community and beyond, building on existing initiatives, and identifying collaboration tracks for the future.”
62 Use Cases for Library Linked Data!
29
W3C Library Linked Data (LLD) Incubator Group Use Case Areas
Bibliographic data Authority data Vocabulary alignment Archives and heterogeneous data Citations Digital objects Collections Social and new uses
Source: Library Linked Data Use Cases
30
W3C Library Linked Data Incubator Group: http://www.w3.org/2005/Incubator/lld/XGR-lld-usecase-20111025/
Some Sample Use Cases
31
Bibliographic Data Use Case: Deduplication and Unification of Library Records Enable matching not based upon data
from a central provider More reference data for
matching/deduplication would be available openly for any library to use
Non-MARC metadata also need deduplication and unification
Using linked data would result in more trusted matches, more opportunities to automate the matching processSource: Library
Linked Data Use Cases
32
Record for Resource B:
Match Point 2
Deduping Records
Record for Resource A:
Match Point 1Match Point 2
If records match on a designated match point, one record overlays the other or a merge algorithm can keep data from both records
Deduplication/Merging of Metadata With and Without Linked Data
Graph for Resource BURI:URI:URI:URI:URI:
Graph for Resource AURI:URI:URI:URI:URI:
Using Linked Data
Algorithm could look at all URIs representing two resources to determine a “match” and combine all URIs into a single graph
33
Authority Data Use Case: Authority Data Enrichment (VIAF)
Enrich already existing authority data with additional information from external data sets by linking instead of copying & merging
Enables VIAF (Virtual International Authority File) to be expanded with huge amounts of data from all over the world
Align different representations of the same real-world resource
Linked data allows the usage of remote data in applications
Source: Library Linked Data Use Cases
34
Vocabulary Alignment Use Case: Vocabulary Merging
Users expect to be able to search for subjects using their own language and terms in an unambiguous, contextualized manner.
Linked Data technologies could provide the underlying infrastructure by semantic mapping or merging of concepts across vocabularies.
Allow vocabularies defined by different sources to organize (classify, index ...) legacy data to be used together
Source: Library Linked Data Use Cases
35
http://aep.lib.rochester.edu/home
36
37
Keyword = “arrow”
38
LCSH via id.loc.gov
39
Vocabulary Merging: Rochester AIDS Posters vs. LCSH
Arrow[URI for UR’s vocabulary AIDS poster terms]
Same asArrow (Symbol) id.loc.gov/authorities/subjects/sh2013000524
40
Archives and Heterogeneous Data Use Case: Semantic Connections
A group of archives would like to better share information about their holdings. They have separate catalogs and these catalogs do not necessarily use the same data formats.
Exporting and sharing their data in Linked Data format would allow them to make connections between the collections using topics, names, place names, and other information contained in their metadata.
Source: Library Linked Data Use Cases
LCSH: AIDS (Disease)--Prevention
43
UR Local vocabulary: AIDS Prevention
44
Rochester AIDS Posters vs. UCLA AIDS Posters: Semantic Connections
AIDS prevention[URI for UR’s vocabulary AIDS prevention]
Same as AIDS (Disease)—Prevention [URI for LCSH term]
45
Social and New Uses Use Case: Search Engine Optimization
Make library data searchable through Web search engines by: Adopting an architecture that is compatible
with web crawling by bots, and Optimizing the available content so that
search engines can process it efficiently Adding structured metadata (e.g. RDFa)
to library online catalogs could increase the visibility and accessibility of their data. Source: Library
Linked Data Use Cases
46
“…the entire publicly available version of WorldCat is now available for use by intelligent Web crawlers, like Google and Bing, that can make use of this metadata in search indexes and other applications. ”
LET’S TURN EVERYTHING ON ITS HEAD…
48
Or, What we learned from developing eXtensible Catalog (XC) software
Envisioning The Future Without Linked Data
49
What is XC software?
eXtensible Catalog (XC) is open source, user-centered, next generation software for libraries.
XC provides a discovery system and a set of tools for libraries to manage metadata and build applications.
50
eXtensible Catalog Funders and Contributors
Major Funding
Andrew W. Mellon Foundation
Major Contributors
Consortium of Academic and Research Libraries in Illinois (CARLI)
Kyushu University
University of Rochester
51
Why Did We Build XC?
Empower libraries to have control over their discovery environment
Put results of user research into practiceExtremely customizable user interface
52
Why Did We Build XC?
Create a new metadata management platform
Implement a FRBR-based record structure
Facilitate RDA implementation
Repurpose MARC 21 records
53
“FRBRized” MARC records
Parsing MARCXML records into linked FRBR-based XC Schema records
MARCXMLBibliographi
c
XCWork
XCExpression
XC Manifestatio
n
Expression Manifested
Work Expressed
“Uplink”= Record ID of the parent record created during OAI-PMH harvest.
Facilitating RDA Implementation
54
XC transforms MARC data into a FRBR-informed “transitional” XML schema
The “XC Schema” uses a subset of RDA elements and roles alongside Dublin Core, some XC data elements
More RDA elements can be added to the schema in the future
55
Repurposing MARC 21 records
Converts MARC codes to vocabulary values
Removes extraneous data Normalizes inconsistencies Maps most MARC fields/subfields
and parse to appropriate FRBR Group 1 entity records
56
(in a nutshell…)
How XC Software Works
57
How XC software works
Harvests a copy of metadata records in an existing repository
Processes (cleans up, transforms) those records
Makes records available for use in other applications
Synchronize records in XC with records in original repositories
…it’s all about metadata records!
58
eXtensible Catalog Architecture
OAIToolkitILS ConnectivitySynchronizedata with XC
NCIPToolkitILS Connectivity- Circ. status- Account info
MSTToolkit
Metadata Services- Cleanup- Format Convert
DrupalToolkit
User Interface- Search- Browse
ILS
MetadataLive Circ. DataUser Interface
ILS “Driver” ILS “Driver”
Digital Repository
59
eXtensible Catalog Architecture
OAIToolkitILS ConnectivitySynchronizedata with XC
NCIPToolkitILS Connectivity- Circ. status- Account info
MSTToolkit
Metadata Services- Cleanup- Format Convert
User Interface- Search- Browse
ILS
MetadataLive Circ. DataUser Interface
ILS “Driver” ILS “Driver”
Digital Repository
DrupalToolkit
Insert your Application with OAI-PMH Harvester here!
60
What we learned from “FRBRizing” MARC in a live production system
…three issues…
61
“FRBRizing” MARC records
Parsing MARCXML records into linked FRBR-based XC Schema records
MARCXMLBibliographi
c
XCWork
XCExpression
XC Manifestatio
n
Expression Manifested
Work Expressed
“Uplink”= Record ID of the parent record created during OAI-PMH harvest.
62
Linked Work, Expression and Manifestation Records in XC
63
64
“Uplinks” between FRBR levels
65
Issue 1: Managing Relationships
Parses MARCXML records into linked FRBR-based records
How many FRBR entity relationships
can we support with XC software?
MARCXMLBibliographi
cXC
Manifestation
XCWork
XCExpression
“Uplink”= Record ID of the parent record created during OAI-PMH harvest.
66
Issue 1: Managing Relationships
MARCXMLBibliographi
cXC
Manifestation
XCWork
XCExpression
XCWork
XCExpression
XCWork
XCExpression
MARC bibliographic records can refer to multiple FRBR entities of the same type (analytics that represent multiple works/expressions, e.g. tracks on a CD)
67
Issue 2: Beyond FRBR Group 1 Entities
MARC “Alternate Graphic Representation” (880 fields) can contain data that belong in records for Group 2 and Group 3 entities
Contributor:700 1 ‡6 880-08 ‡a Vasil’ev, Maksim.880 1 ‡6 700-08 ‡a Васильев, Максим.
Subject:600 10 ‡6 880-06 ‡a Putin, Vladimir Vladimirovich, ‡d 1952- 880 10 ‡6 600-06 ‡a Путин, Владимир Владимирович, ‡d 1952-
68
Issue 2: Beyond FRBR Group 1 Entities
MARCXMLBibliographi
cXC
Manifestation
XCWork
XCExpression
If we were to parse this 880 data correctly, we would need to create and link to two additional records for Contributor and Subject that include the alternate scriptsContributor
(alternate forms from 880)•Contributor in Cyrillic characters•Contributor in Roman characters
Subject(alternate forms from
880) •Subject in Cyrillic characters•Subject in Roman characters
69
Issue 3: Related Group 1 Entities
Language attribute for a related expression
041 1 ‡a eng ‡h ita100 0 ‡a Dante Alighieri, ‡d 1265-1321.240 10 ‡a Divina commedia. ‡l English245 14 ‡a The divine comedy / ‡c Dante ; a
new verse translation by C.H. Sisson.500 ‡a Translation of: Divina commedia.
70
Managing Relationships
MARCXMLBibliographi
cXC
Manifestation
XCWork
XCExpression
If we were to parse the original language from 041 ‡h, we would need to create and link to another “based on” expression record (if we even have enough information to create it)Contributor
(alternate forms from 880)•Contributor in Cyrillic characters•Contributor in Roman characters
Subject(alternate forms from
880) •Subject in Cyrillic characters•Subject in Roman characters
Based on (Expression) – from 041
‡h
What XC has taught us about FRBR…
The GOOD news: MARC data is very rich, and contains data about MANY relationships described in FRBR and related data models
There are hundreds of RDA Relationships between FRBR entitles!
72
•new records•changed records•deleted records•changed relationships
Maintaining links between separate FRBR entity records in a production environment is likely not scalable if we continue to manipulate records.
What XC has taught us about FRBR
XC Manifestatio
n
XCWork
XCExpression
73
What XC has taught us about FRBR…
The GOOD news: MARC data is very rich, and contains data about MANY relationships described in FRBR and related data models
The BAD news: managing all of these relationships in a record-based system is probably not feasible
74
RDA Implementation Scenario 1 (2007)
XC AND LINKED DATA: OUR “AHA!” MOMENTS!”
76
Our first “Aha! Moment”
It would be much easier to “FRBRize” MARC data using Linked Data than by creating and maintaining links between separate metadata records that have FRBR-related relationships to each other!
77
A Second “Aha” Moment!
Creating Linked Data triples that refer to FRBR entities would be more meaningful than creating triples that refer to MARC records
XC handles the interim step, of converting MARC data to FRBR entities
78
RDF triple
Object
Predicate
Subject
This resourcehas
creator J. K. Rowling
79
With and without FRBR
Without FRBR: <MARCBibRecord-number> has_author “J K
Rowling”
With FRBR: <Work-id> has_creator “J K Rowling” <Expression-id> has_language “English” <Expression-id> has_parent_work <Work-id> <Manifestation-id> has_isbn <ISBN-number> <Manifestation-id> has_parent_expression
<Expression-id>
80
Why use FRBR for Linked Data?
User research shows that users want to see the relationships between resources, etc.
With XC, we can explore when/how FRBR might be useful for linked data
Other data models may be more appropriate in some contexts and those can be explored as well.
81
Another not-quite “AHA! Moment”…
XC can serve as an interim step to create Linked Data because XC’s underlying schema uses elements from registered element sets (i.e. data elements already have URIs)
82
RDF Triple - Registered Data Elements
http://www.extensiblecatalog.info/Elements/
subject
Object
Predicate
Subject
oai:mst.rochester.edu: MST/MARCToXCTransformation/10081
This resource has subject
Poets, American
http://id.loc.gov/authorities/sh85103735#concept
83
XC Schema Properties
Dublin Core terms (all) RDA – subset of elements
and role designators XC elements (newly-
defined) – when necessary All properties are from
registered element sets and thus already have URIs
DC
RDA
XC
84
XC and Linked Data: What’s Next?
XC facilitates associating metadata with FRBR Group 1 entities using data elements (mostly from RDA and Dublin Core) Implementing FRBR may help us create more meaningful Linked Data in some situations
How can we make XC actually output Linked Data?
http://estc.bl.uk/
http://estc21.wordpress.com/
87
eXtensible Catalog Architecture
OAIToolkitILS ConnectivitySynchronizedata with XC
NCIPToolkitILS Connectivity- Circ. status- Account info
Metadata Services- Cleanup- Format Convert
User Interface- Search- Browse
ILS
MetadataLive Circ. DataUser Interface
ILS “Driver” ILS “Driver”
Digital Repository
DrupalToolkitNew ESTC Interface to be built on Collex software
MSTToolkit
Metadata Services- Cleanup- Format Convert
88
ESTC Linked Data Benefits
Make data available for computational use Transform data back to MARC for reuse in library
systems More granularity of data (e.g. date ranges) Collect new types of information, some not
supported by MARC Incorporate information from other projects
(VIAF) Make ESTC data more amenable to reuse by
other projects, including discrete bits of data http://estc21.wordpress.com/data/
LINKED DATA CHALLENGES (WHY WE SHOULDN’T CREATE LINKED DATA?)
90
“We won’t be able to control our data!”
91
Linked OPEN Data?
How much data to make available? Concerns about jeopardizing future
business models Can we predict now how much data will
be needed to fulfill future use cases? Metadata licensing issues Rights management
92
How will we assess quality?
Provenance: where did this data come from?
Should “triples” become “quadruples” so we can tell “who said this”?
Is the data accurate?
93
How can we maintain/improve quality?
How to manage data coming from multiple sources?
What are best practices for improving it? Can we take advantage of information in
application profiles? How/when should we aggregate
metadata? What tools will we need?
94
Next Steps: Continue the discussion!
JENNIFER [email protected]
U
Additional photo credits: University of Rochester Photographic Serviceswww.publicdomainpictures.net/view-image.php?image=54374&picture=running-bulls-12www.dreamstime.com/stock-photos-group-kids-children-running-image5855523www.publicdomainpictures.net/view-image.php?image=49200&picture=herd-of-horses www.publicdomainpictures.net/view-image.php?image=42311&picture=ocean-through-window-framewww.publicdomainpictures.net/view-image.php?image=10217&picture=golden-starwww.publicdomainpictures.net/view-image.php?image=27317&picture=hand-toolswww.publicdomainpictures.net/view-image.php?image=27274&picture=stair-steps
Thank you!