95
LINKED DATA: WHY BOTHER? JENNIFER BOWEN, UNIVERSITY OF ROCHESTER NOTSL MEETING, KENT STATE UNIVERSITY NOVEMBER 22, 2013

Linked Data: Why Bother?

Embed Size (px)

DESCRIPTION

Presented at the Northern Ohio Technical Services Librarians' meeting, November 22, 2013. Describes why libraries should move toward a linked data future to enable their resources to be discoverable on the open web, and includes lessons learned from developing the eXtensible Catalog at the University of Rochester.

Citation preview

Page 1: Linked Data:  Why Bother?

LINKED DATA: WHY BOTHER?

JENNIFER BOWEN, UNIVERSITY OF ROCHESTERNOTSL MEETING, KENT STATE UNIVERSITY

NOVEMBER 22, 2013

Page 2: Linked Data:  Why Bother?

2

My Topics Today

The “Vision” piece: Why should libraries care about linked data?

A few linked data use cases for libraries Can libraries achieve their metadata-

related goals WITHOUT linked data? Lessons learned from developing the

eXtensible Catalog and what that has to do with linked data

Page 3: Linked Data:  Why Bother?

eXtensibleCatalog.org

Page 4: Linked Data:  Why Bother?

XC User Research Partners:

Cornell UniversityOhio State University

University of RochesterYale University

Studying scholars at the UR…

Page 5: Linked Data:  Why Bother?

Scholars want to read everything

on the topic that they are researching

Page 6: Linked Data:  Why Bother?

They want to be in the middle of everything they need, all

organized so it is findable and usable

Page 7: Linked Data:  Why Bother?

Scholars want their research to be findable and usable by others.

Page 8: Linked Data:  Why Bother?

8

“These other researchers cite MY research…”

Page 9: Linked Data:  Why Bother?

Scholars want to connect to people whose work is interesting and useful to

them.

Page 10: Linked Data:  Why Bother?

Scholars don’t care what the technology is,

as long as it helps them do their work

Page 11: Linked Data:  Why Bother?

11

A shift in how people seek and use information

Systems that libraries provide (websites, catalogs, databases) are bypassed

…not just in favor of Google and the Web in general

…but also in favor of tailored desktop, mobile, and web applications

Page 12: Linked Data:  Why Bother?

12

Beyond library finding tools

“Even scholars who continue to use library finding tools are turning to new applications to aggregate and analyze information in ways that extend their scholarship beyond what manual searching and analyzing allows.” -- Nancy Fried Foster

Senior Anthropologist, Ithaka S+R

Page 13: Linked Data:  Why Bother?

13

Vision for how to address this…

Make library resources discoverable on the open web, through applications that potential readers are already using:

Search enginesMobile appsSocial media

Page 14: Linked Data:  Why Bother?

AN EXAMPLE…

Page 15: Linked Data:  Why Bother?

An example…Mt. Hope Cemetery

Photo credits: ROCHESTER’S SPEAKING STONES By Th. Emil Homerin; University of Rochester Department of Religion and Classics http://www.rochester.edu/College/REL/faculty/homerin/REL167/reports.htm

Page 16: Linked Data:  Why Bother?

16

An example…Mt. Hope Cemetery

Photo credit: www.findagrav.com/cgi-bin/fg.cgi?page=pv&GRid=31&PIpi=76016

Page 17: Linked Data:  Why Bother?

17

Photo credits: University of Rochester. River Campus Libraries. Department of Rare Books and Special Collections. http://www.lib.rochester.edu/index.cfm?PAGE=4119

Page 18: Linked Data:  Why Bother?

18

What’s the role of linked data?

Tools like this are possible today with dedicated programming.

Linked Data will enable library resources to be included in applications like this by allowing application developers access to a “…a store of machine-actionable data on which improved services can be built”. (Linked Open Data value statement)

Page 19: Linked Data:  Why Bother?

THREE INITIATIVES RELATED TO LINKED DATA AND LIBRARIES

Page 20: Linked Data:  Why Bother?

20

Stanford Linked Data Workshop (2011)

http://www.clir.org/pubs/reports/pub152/LinkedDataWorkshop.pdf

Linked Open Data Value Statements

Page 21: Linked Data:  Why Bother?

21

Linked Open Data Value Statements

Linked Open Data (LOD) puts information where people are looking for it: on the web

LOD can expand discoverability of our content

LOD opens opportunities for creative innovation in digital scholarship and participation

LOD allows for open continuous improvement of data

LOD creates a store of machine-actionable data on which improved services can be built

http://www.clir.org/pubs/reports/pub152/LinkedDataWorkshop.pdf

Page 22: Linked Data:  Why Bother?

22

More Linked Open Data Value Statements

Library LOD might facilitate the breakdown of the tyranny of domain silos

LOD can provide direct access to data in ways that are not currently possible, and provides unanticipated benefits that will emerge later as the stores of LOD expand

http://www.clir.org/pubs/reports/pub152/LinkedDataWorkshop.pdf

Page 23: Linked Data:  Why Bother?

23

Another library linked data initiative: BIBFRAME

www.loc.gov/bibframe/

Page 24: Linked Data:  Why Bother?

24

What is BIBFRAME?

Library of Congress-led effort to replace MARC 21 with a new bibliographic model based upon linked data

“Determine a transition path for the MARC 21 exchange format in order to reap the benefits of newer technology while preserving a robust data exchange that has supported resource sharing and cataloging cost savings in recent decades.”

Page 25: Linked Data:  Why Bother?

25

More goals of LC’s BIBFRAME Differentiate between conceptual

content and physical manifestations (works and instances)

Focus on unambiguously identifying information entities (e.g. authorities)

Leverage and expose relationships between and among entities

http://bibframe.org/

Page 26: Linked Data:  Why Bother?

26

Potential Issues with BIBFRAME Conceptual model doesn’t fully conform

to either FRBR or RDA (e.g. no “expression” level) – is this a problem?

Will organizations that have already implemented linked data use BIBFRAME once it is finished?

Do we really need a new serialization of MARC dictated by LC?

http://bibframe.org/

Page 27: Linked Data:  Why Bother?

WHAT CAN LIBRARIES ACTUALLY DO WITH LINKED DATA?

Let’s get a little more specific…

Page 28: Linked Data:  Why Bother?

28

“The mission of the Library Linked Data incubator group is to help increase global interoperability of library data on the Web, by bringing together people involved in Semantic Web activities—focusing on Linked Data—in the library community and beyond, building on existing initiatives, and identifying collaboration tracks for the future.”

62 Use Cases for Library Linked Data!

Page 29: Linked Data:  Why Bother?

29

W3C Library Linked Data (LLD) Incubator Group Use Case Areas

Bibliographic data Authority data Vocabulary alignment Archives and heterogeneous data Citations Digital objects Collections Social and new uses

Source: Library Linked Data Use Cases

Page 30: Linked Data:  Why Bother?

30

W3C Library Linked Data Incubator Group: http://www.w3.org/2005/Incubator/lld/XGR-lld-usecase-20111025/

Some Sample Use Cases

Page 31: Linked Data:  Why Bother?

31

Bibliographic Data Use Case: Deduplication and Unification of Library Records Enable matching not based upon data

from a central provider More reference data for

matching/deduplication would be available openly for any library to use

Non-MARC metadata also need deduplication and unification

Using linked data would result in more trusted matches, more opportunities to automate the matching processSource: Library

Linked Data Use Cases

Page 32: Linked Data:  Why Bother?

32

Record for Resource B:

Match Point 2

Deduping Records

Record for Resource A:

Match Point 1Match Point 2

If records match on a designated match point, one record overlays the other or a merge algorithm can keep data from both records

Deduplication/Merging of Metadata With and Without Linked Data

Graph for Resource BURI:URI:URI:URI:URI:

Graph for Resource AURI:URI:URI:URI:URI:

Using Linked Data

Algorithm could look at all URIs representing two resources to determine a “match” and combine all URIs into a single graph

Page 33: Linked Data:  Why Bother?

33

Authority Data Use Case: Authority Data Enrichment (VIAF)

Enrich already existing authority data with additional information from external data sets by linking instead of copying & merging

Enables VIAF (Virtual International Authority File) to be expanded with huge amounts of data from all over the world

Align different representations of the same real-world resource

Linked data allows the usage of remote data in applications

Source: Library Linked Data Use Cases

Page 34: Linked Data:  Why Bother?

34

Vocabulary Alignment Use Case: Vocabulary Merging

Users expect to be able to search for subjects using their own language and terms in an unambiguous, contextualized manner.

Linked Data technologies could provide the underlying infrastructure by semantic mapping or merging of concepts across vocabularies.

Allow vocabularies defined by different sources to organize (classify, index ...) legacy data to be used together

Source: Library Linked Data Use Cases

Page 35: Linked Data:  Why Bother?

35

http://aep.lib.rochester.edu/home

Page 36: Linked Data:  Why Bother?

36

Page 37: Linked Data:  Why Bother?

37

Keyword = “arrow”

Page 38: Linked Data:  Why Bother?

38

LCSH via id.loc.gov

Page 39: Linked Data:  Why Bother?

39

Vocabulary Merging: Rochester AIDS Posters vs. LCSH

Arrow[URI for UR’s vocabulary AIDS poster terms]

Same asArrow (Symbol) id.loc.gov/authorities/subjects/sh2013000524

Page 40: Linked Data:  Why Bother?

40

Archives and Heterogeneous Data Use Case: Semantic Connections

A group of archives would like to better share information about their holdings. They have separate catalogs and these catalogs do not necessarily use the same data formats.

Exporting and sharing their data in Linked Data format would allow them to make connections between the collections using topics, names, place names, and other information contained in their metadata.

Source: Library Linked Data Use Cases

Page 41: Linked Data:  Why Bother?
Page 42: Linked Data:  Why Bother?

LCSH: AIDS (Disease)--Prevention

Page 43: Linked Data:  Why Bother?

43

UR Local vocabulary: AIDS Prevention

Page 44: Linked Data:  Why Bother?

44

Rochester AIDS Posters vs. UCLA AIDS Posters: Semantic Connections

AIDS prevention[URI for UR’s vocabulary AIDS prevention]

Same as AIDS (Disease)—Prevention [URI for LCSH term]

Page 45: Linked Data:  Why Bother?

45

Social and New Uses Use Case: Search Engine Optimization

Make library data searchable through Web search engines by: Adopting an architecture that is compatible

with web crawling by bots, and Optimizing the available content so that

search engines can process it efficiently Adding structured metadata (e.g. RDFa)

to library online catalogs could increase the visibility and accessibility of their data. Source: Library

Linked Data Use Cases

Page 46: Linked Data:  Why Bother?

46

“…the entire publicly available version of WorldCat is now available for use by intelligent Web crawlers, like Google and Bing, that can make use of this metadata in search indexes and other applications. ”

Page 47: Linked Data:  Why Bother?

LET’S TURN EVERYTHING ON ITS HEAD…

Page 48: Linked Data:  Why Bother?

48

Or, What we learned from developing eXtensible Catalog (XC) software

Envisioning The Future Without Linked Data

Page 49: Linked Data:  Why Bother?

49

What is XC software?

eXtensible Catalog (XC) is open source, user-centered, next generation software for libraries.

XC provides a discovery system and a set of tools for libraries to manage metadata and build applications.

Page 50: Linked Data:  Why Bother?

50

eXtensible Catalog Funders and Contributors

Major Funding

Andrew W. Mellon Foundation

Major Contributors

Consortium of Academic and Research Libraries in Illinois (CARLI)

Kyushu University

University of Rochester

Page 51: Linked Data:  Why Bother?

51

Why Did We Build XC?

Empower libraries to have control over their discovery environment

Put results of user research into practiceExtremely customizable user interface

Page 52: Linked Data:  Why Bother?

52

Why Did We Build XC?

Create a new metadata management platform

Implement a FRBR-based record structure

Facilitate RDA implementation

Repurpose MARC 21 records

Page 53: Linked Data:  Why Bother?

53

“FRBRized” MARC records

Parsing MARCXML records into linked FRBR-based XC Schema records

MARCXMLBibliographi

c

XCWork

XCExpression

XC Manifestatio

n

Expression Manifested

Work Expressed

“Uplink”= Record ID of the parent record created during OAI-PMH harvest.

Page 54: Linked Data:  Why Bother?

Facilitating RDA Implementation

54

XC transforms MARC data into a FRBR-informed “transitional” XML schema

The “XC Schema” uses a subset of RDA elements and roles alongside Dublin Core, some XC data elements

More RDA elements can be added to the schema in the future

Page 55: Linked Data:  Why Bother?

55

Repurposing MARC 21 records

Converts MARC codes to vocabulary values

Removes extraneous data Normalizes inconsistencies Maps most MARC fields/subfields

and parse to appropriate FRBR Group 1 entity records

Page 56: Linked Data:  Why Bother?

56

(in a nutshell…)

How XC Software Works

Page 57: Linked Data:  Why Bother?

57

How XC software works

Harvests a copy of metadata records in an existing repository

Processes (cleans up, transforms) those records

Makes records available for use in other applications

Synchronize records in XC with records in original repositories

…it’s all about metadata records!

Page 58: Linked Data:  Why Bother?

58

eXtensible Catalog Architecture

OAIToolkitILS ConnectivitySynchronizedata with XC

NCIPToolkitILS Connectivity- Circ. status- Account info

MSTToolkit

Metadata Services- Cleanup- Format Convert

DrupalToolkit

User Interface- Search- Browse

ILS

MetadataLive Circ. DataUser Interface

ILS “Driver” ILS “Driver”

Digital Repository

Page 59: Linked Data:  Why Bother?

59

eXtensible Catalog Architecture

OAIToolkitILS ConnectivitySynchronizedata with XC

NCIPToolkitILS Connectivity- Circ. status- Account info

MSTToolkit

Metadata Services- Cleanup- Format Convert

User Interface- Search- Browse

ILS

MetadataLive Circ. DataUser Interface

ILS “Driver” ILS “Driver”

Digital Repository

DrupalToolkit

Insert your Application with OAI-PMH Harvester here!

Page 60: Linked Data:  Why Bother?

60

What we learned from “FRBRizing” MARC in a live production system

…three issues…

Page 61: Linked Data:  Why Bother?

61

“FRBRizing” MARC records

Parsing MARCXML records into linked FRBR-based XC Schema records

MARCXMLBibliographi

c

XCWork

XCExpression

XC Manifestatio

n

Expression Manifested

Work Expressed

“Uplink”= Record ID of the parent record created during OAI-PMH harvest.

Page 62: Linked Data:  Why Bother?

62

Linked Work, Expression and Manifestation Records in XC

Page 63: Linked Data:  Why Bother?

63

Page 64: Linked Data:  Why Bother?

64

“Uplinks” between FRBR levels

Page 65: Linked Data:  Why Bother?

65

Issue 1: Managing Relationships

Parses MARCXML records into linked FRBR-based records

How many FRBR entity relationships

can we support with XC software?

MARCXMLBibliographi

cXC

Manifestation

XCWork

XCExpression

“Uplink”= Record ID of the parent record created during OAI-PMH harvest.

Page 66: Linked Data:  Why Bother?

66

Issue 1: Managing Relationships

MARCXMLBibliographi

cXC

Manifestation

XCWork

XCExpression

XCWork

XCExpression

XCWork

XCExpression

MARC bibliographic records can refer to multiple FRBR entities of the same type (analytics that represent multiple works/expressions, e.g. tracks on a CD)

Page 67: Linked Data:  Why Bother?

67

Issue 2: Beyond FRBR Group 1 Entities

MARC “Alternate Graphic Representation” (880 fields) can contain data that belong in records for Group 2 and Group 3 entities

Contributor:700 1 ‡6 880-08 ‡a Vasil’ev, Maksim.880 1 ‡6 700-08 ‡a Васильев, Максим.

Subject:600 10 ‡6 880-06 ‡a Putin, Vladimir Vladimirovich, ‡d 1952- 880 10 ‡6 600-06 ‡a Путин, Владимир Владимирович, ‡d 1952-

Page 68: Linked Data:  Why Bother?

68

Issue 2: Beyond FRBR Group 1 Entities

MARCXMLBibliographi

cXC

Manifestation

XCWork

XCExpression

If we were to parse this 880 data correctly, we would need to create and link to two additional records for Contributor and Subject that include the alternate scriptsContributor

(alternate forms from 880)•Contributor in Cyrillic characters•Contributor in Roman characters

Subject(alternate forms from

880) •Subject in Cyrillic characters•Subject in Roman characters

Page 69: Linked Data:  Why Bother?

69

Issue 3: Related Group 1 Entities

Language attribute for a related expression

041 1 ‡a eng ‡h ita100 0 ‡a Dante Alighieri, ‡d 1265-1321.240 10 ‡a Divina commedia. ‡l English245 14 ‡a The divine comedy / ‡c Dante ; a

new verse translation by C.H. Sisson.500 ‡a Translation of: Divina commedia.

Page 70: Linked Data:  Why Bother?

70

Managing Relationships

MARCXMLBibliographi

cXC

Manifestation

XCWork

XCExpression

If we were to parse the original language from 041 ‡h, we would need to create and link to another “based on” expression record (if we even have enough information to create it)Contributor

(alternate forms from 880)•Contributor in Cyrillic characters•Contributor in Roman characters

Subject(alternate forms from

880) •Subject in Cyrillic characters•Subject in Roman characters

Based on (Expression) – from 041

‡h

Page 71: Linked Data:  Why Bother?

What XC has taught us about FRBR…

The GOOD news: MARC data is very rich, and contains data about MANY relationships described in FRBR and related data models

There are hundreds of RDA Relationships between FRBR entitles!

Page 72: Linked Data:  Why Bother?

72

•new records•changed records•deleted records•changed relationships

Maintaining links between separate FRBR entity records in a production environment is likely not scalable if we continue to manipulate records.

What XC has taught us about FRBR

XC Manifestatio

n

XCWork

XCExpression

Page 73: Linked Data:  Why Bother?

73

What XC has taught us about FRBR…

The GOOD news: MARC data is very rich, and contains data about MANY relationships described in FRBR and related data models

The BAD news: managing all of these relationships in a record-based system is probably not feasible

Page 74: Linked Data:  Why Bother?

74

RDA Implementation Scenario 1 (2007)

Page 75: Linked Data:  Why Bother?

XC AND LINKED DATA: OUR “AHA!” MOMENTS!”

Page 76: Linked Data:  Why Bother?

76

Our first “Aha! Moment”

It would be much easier to “FRBRize” MARC data using Linked Data than by creating and maintaining links between separate metadata records that have FRBR-related relationships to each other!

Page 77: Linked Data:  Why Bother?

77

A Second “Aha” Moment!

Creating Linked Data triples that refer to FRBR entities would be more meaningful than creating triples that refer to MARC records

XC handles the interim step, of converting MARC data to FRBR entities

Page 78: Linked Data:  Why Bother?

78

RDF triple

Object

Predicate

Subject

This resourcehas

creator J. K. Rowling

Page 79: Linked Data:  Why Bother?

79

With and without FRBR

Without FRBR: <MARCBibRecord-number> has_author “J K

Rowling”

With FRBR: <Work-id> has_creator “J K Rowling” <Expression-id> has_language “English” <Expression-id> has_parent_work <Work-id> <Manifestation-id> has_isbn <ISBN-number> <Manifestation-id> has_parent_expression

<Expression-id>

Page 80: Linked Data:  Why Bother?

80

Why use FRBR for Linked Data?

User research shows that users want to see the relationships between resources, etc.

With XC, we can explore when/how FRBR might be useful for linked data

Other data models may be more appropriate in some contexts and those can be explored as well.

Page 81: Linked Data:  Why Bother?

81

Another not-quite “AHA! Moment”…

XC can serve as an interim step to create Linked Data because XC’s underlying schema uses elements from registered element sets (i.e. data elements already have URIs)

Page 82: Linked Data:  Why Bother?

82

RDF Triple - Registered Data Elements

http://www.extensiblecatalog.info/Elements/

subject

Object

Predicate

Subject

oai:mst.rochester.edu: MST/MARCToXCTransformation/10081

This resource has subject

Poets, American

http://id.loc.gov/authorities/sh85103735#concept

Page 83: Linked Data:  Why Bother?

83

XC Schema Properties

Dublin Core terms (all) RDA – subset of elements

and role designators XC elements (newly-

defined) – when necessary All properties are from

registered element sets and thus already have URIs

DC

RDA

XC

Page 84: Linked Data:  Why Bother?

84

XC and Linked Data: What’s Next?

XC facilitates associating metadata with FRBR Group 1 entities using data elements (mostly from RDA and Dublin Core) Implementing FRBR may help us create more meaningful Linked Data in some situations

How can we make XC actually output Linked Data?

Page 85: Linked Data:  Why Bother?

http://estc.bl.uk/

Page 86: Linked Data:  Why Bother?

http://estc21.wordpress.com/

Page 87: Linked Data:  Why Bother?

87

eXtensible Catalog Architecture

OAIToolkitILS ConnectivitySynchronizedata with XC

NCIPToolkitILS Connectivity- Circ. status- Account info

Metadata Services- Cleanup- Format Convert

User Interface- Search- Browse

ILS

MetadataLive Circ. DataUser Interface

ILS “Driver” ILS “Driver”

Digital Repository

DrupalToolkitNew ESTC Interface to be built on Collex software

MSTToolkit

Metadata Services- Cleanup- Format Convert

Page 88: Linked Data:  Why Bother?

88

ESTC Linked Data Benefits

Make data available for computational use Transform data back to MARC for reuse in library

systems More granularity of data (e.g. date ranges) Collect new types of information, some not

supported by MARC Incorporate information from other projects

(VIAF) Make ESTC data more amenable to reuse by

other projects, including discrete bits of data http://estc21.wordpress.com/data/

Page 89: Linked Data:  Why Bother?

LINKED DATA CHALLENGES (WHY WE SHOULDN’T CREATE LINKED DATA?)

Page 90: Linked Data:  Why Bother?

90

“We won’t be able to control our data!”

Page 91: Linked Data:  Why Bother?

91

Linked OPEN Data?

How much data to make available? Concerns about jeopardizing future

business models Can we predict now how much data will

be needed to fulfill future use cases? Metadata licensing issues Rights management

Page 92: Linked Data:  Why Bother?

92

How will we assess quality?

Provenance: where did this data come from?

Should “triples” become “quadruples” so we can tell “who said this”?

Is the data accurate?

Page 93: Linked Data:  Why Bother?

93

How can we maintain/improve quality?

How to manage data coming from multiple sources?

What are best practices for improving it? Can we take advantage of information in

application profiles? How/when should we aggregate

metadata? What tools will we need?

Page 94: Linked Data:  Why Bother?

94

Next Steps: Continue the discussion!

Page 95: Linked Data:  Why Bother?

JENNIFER [email protected]

U

Additional photo credits: University of Rochester Photographic Serviceswww.publicdomainpictures.net/view-image.php?image=54374&picture=running-bulls-12www.dreamstime.com/stock-photos-group-kids-children-running-image5855523www.publicdomainpictures.net/view-image.php?image=49200&picture=herd-of-horses www.publicdomainpictures.net/view-image.php?image=42311&picture=ocean-through-window-framewww.publicdomainpictures.net/view-image.php?image=10217&picture=golden-starwww.publicdomainpictures.net/view-image.php?image=27317&picture=hand-toolswww.publicdomainpictures.net/view-image.php?image=27274&picture=stair-steps

Thank you!