Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Linked Data for NEOSA Workshop
Ian Bigelow, Danoosh Davoodi, Sharon Farnel & Abigail Sparling
Moving Forward with Linked Data at the UAL
Linked data implementation as a strategic priority
“In order to reap the benefits of full participation in the linked open data environment,
UAL should continue to take steps towards complete conversion of existing library
data to linked open data. This would involve a full transition of workflows for resource
description/metadata creation to linked open data, transitioning all library systems for
resource discovery so they work with linked open data formats, and developing new
workflows, both internal and with associated vendors and partners, to support these
steps.”¹
What might this mean for you? … What do you think?
1. Moving Forward with Linked Data at UAL
2
Linked Data Overview
The Semantic Web
“The Semantic Web will bring structure to the
meaningful content of web pages. It is not a
separate Web but an extension of the existing one,
in which information is given well-defined
meaning, better enabling computers and people to
work in cooperation”
Berners-Lee, T., Hendler, J., Lassila, O. (2001). The Semantic Web.
ScientificAmerican.com
lod-cloud.net: https://creativecommons.org/licenses/by/4.0/4
What is linked data?
“The collection of interrelated datasets on
the Web can be referred to as Linked
Data”.
https://www.w3.org/standards/semanticweb/data.html
Jakov M. Vežić (https://www.facebook.com/photo.php?fbid=10214499563312325&set=gm.1820667984616319&type=3&theater)5
Principles of linked data1. Use URIs (Uniform Resource Identifiers) to
name things
2. Use http URIs so that people and machines
can look up those names
3. When a person or a machine looks up a URI,
provide useful information using Web
standards such as RDF, SPARQL, JSON
4. Include links to other URIs so that a person
or machine can discover other things
5. Use an open license*
* appropriate openness; open data ≠ linked data
Berners-Lee, T. (2006). https://www.w3.org/DesignIssues/LinkedData.html6
Examples of triple patterns
Danoosh Davoodi
Knows
Ian Bigelow
https://orcid.org/0000-0003-2474-7929 foaf:knowshttps://orcid.org/0000-0002-1961-6097
Object
Predicate
Subject
Google Knowledge GraphAn intelligent model of
entities and relationships
Designed to enhance
search and discovery in
three ways:
1. find the right thing
2. get the best summary
3. go deeper and
broader
8
Check-In1. What are the basic building blocks of linked data?
2. How might this data structure benefit library resource
discovery?
Linked Data for Libraries
Linked Data and Libraries● Many examples of linked data outside of libraries
● Uptake in libraries has been slow and uneven
○ paradigm shift
○ lack of skills and expertise
○ lack of practical starter projects
○ challenge of data conversion
○ changes in workflows
○ lack of system support
● But things are shifting
○ viable alternatives to MARC exist and are being implemented
○ standards and workflows are being rebuilt to facilitate this shift
○ moves in repository communities to linked data for interoperability
○ linked data potential for managing knowledge production lifecycle
What is MARC?Machine Readable Cataloguing
● A transition from the catalogue card to working in an online catalogue
● Developed in the 1960s by Henriette Avram
● MARC has been around for a very long time and it is still the primary encoding
format for bibliographic metadata for libraries worldwide … for now
Whither MARC?● “MARC must die!” - Roy Tennant, October 2002
○ “I wanted librarianship to wake up to the fact that our foundational standard was no
longer serving us like it should”
● What is the future of MARC in an increasingly digital, interconnected
information environment?
○ functions to a point but leaves library records siloed
○ focuses on records that are independently understandable
○ data not easily parsed
“So what has happened over the last 15 years? For starters, no one seems to
think it’s controversial anymore. The Library of Congress has not only
admitted that MARC’s days are numbered, they are actively working to
develop a linked data replacement”. (Roy Tennant, “MARC Must Die” 15 Years On -
http://hangingtogether.org/?p=6221)
Why linked data? Why for libraries?● facilitate data integration and enable
interconnection of previously disconnected
datasets
● addition of each new dataset increases value
of existing datasets (the network effect)
● browsing through data is easier with URIs
● increased use and pressure to improve data
quality
● data as a service increases usability
● use of flexible and extensible data models
● compatibility with existing standards
● encourages openness, sharing, and reuse
● enhanced discovery experiences
● make rich library data actionable
● enhanced integration across collections
based on flexible and extensible data models
and shared principles
● opportunities for modeling different
worldviews to enable more contextually
appropriate descriptions
● enable enhanced collaborations across
libraries, archives, museums
● enhanced capabilities for researcher identity
and research output management
● streamlined workflows for metadata creation
and enhancement
14
Challenges● Scope and scale of existing library metadata
○ billions of MARC records in the ecosystem; any replacement for MARC must meet numerous use
cases
○ plethora of metadata in flavours of XML and other formats that need careful and appropriate
transformation
● Infrastructure
○ most ILS and discovery systems are not currently tooled for working with linked data
○ many digital asset management and repository platforms incorporate linked data minimally
● Skills and Expertise
○ many staff have been working with existing standards and formats for a very long time
● Has linked data proven its value
○ still seen to be experimental, lack of practical starter projects
15
Shifting Standards
Program for Cooperative Cataloguing (PCC)
“It is time to move beyond knowledge and skills related to linked data at a theoretical
level and into implementation. Building on the PCC’s strong tradition of providing
training for metadata creators, active experimentation and piloting of linked data
practices will help inform policy decisions, training, and operationalizing such
practices. As we move to a culture of greater data sharing, it is crucial to extend our
community, both by engaging a more diverse range of members in the work of the
PCC and by collaborating with vendors, open source communities, and others.”
(Program for Cooperative Cataloguing, 2018)
Program for Cooperative Cataloguing (2018). PCC (Program for Cooperative Cataloging) Strategic DirectionsJanuary 2018-December 2021. Retrieved from https://www.loc.gov/aba/pcc/about/PCC-Strategic-Directions-2018-2021.pdf
17
Changing Standards for Resource Description
Resource Description and Access (RDA)“RDA is a package of data elements, guidelines, and instructions for creating library
and cultural heritage resource metadata that are well-formed according to
international models for user-focussed linked data applications.” (Committee of
Principals for RDA, 2015)
RDA Toolkit, RIMMF, RDA Registry and RDA Vocabulary Server all draw on RDA
Vocabularies published in RDF.
Based on LRM implementation in RDA and preparation for linked data environments,
the 3R update will have an increased emphasis on relationships rather than attributes
RDA Steering Committee (2015). RDA, Committee of Principals Affirms Commitment to the Internationalisation of RDA. Retrieved from http://www.rda-rsc.org/node/235
18
19
Check-In1. Why are libraries looking to move away from MARC?
2. What is one benefit of linked data for libraries?
3. What is one challenge faced by libraries in implementing
linked data?
BIBFRAME
At the Annual Meeting of the American
Library Association (ALA) in June 2018, LC
confirmed that BIBFRAME will be their
replacement for MARC
BIBFRAME● Initiative of Library of Congress and
community partners and collaborators in
2011
● Provides a foundation for the future of
bibliographic description on and of the web
● Based on linked data principles and
standards
● Goes beyond “replacing” MARC
○ different model for expressing and
connecting bibliographic data
● https://www.loc.gov/bibframe
● http://bibframe.org/
● Three core levels of abstraction
○ Work
○ Instance
○ Item
● Additional key concepts
○ Agents
○ Subjects
○ Events
● Consists of RDF classes and properties
○ members of a class share certain
characteristics and may have subclasses
○ properties describe characteristics of
resources as well as relationships among
resources
BIBFRAME 2.0
bf:Work● Highest level of abstraction
● Reflects conceptual essence of a resource
● Reflects information such as creator, language, subject
● Roughly FRBR Work and Expression
● Properties include
○ content
○ originPlace
○ musicMedium
○ geographicCoverage
○ eventContentOf
bf:Instance● Individual, material embodiment of a work
● Reflects information such as publisher, place of publication, format
● Roughly FRBR Manifestation
● Properties include
○ carrier
○ extent
○ fontSize
○ polarity
○ issuedWith
○ provisionActivity
bf:Item● A copy (digital or physical) of an Instance
● Reflects information such as location (virtual or physical), shelf mark, barcode
● Roughly FRBR Item
● Properties include
○ heldBy
○ shelfMark
○ electronicLocator
○ sublocation
○ enumerationAndChronology
bf:Agent● People, organizations, jurisdictions associated with a Work or Instance
● Reflects roles such as editor, composer, holding institution
● Roughly FRBR Group 2 entities
● Subclasses include
○ family
○ organization
○ jurisdiction
○ meeting
○ person
bf:Subject● Captures “aboutness” of a Work
● Roughly FRBR Group 3 entities
● Subjects may include
○ topics
○ places
○ events
○ agents
○ Works
bf:Event● An occurrence that is recorded and constitutes the content of a Work
● Can include
○ concerts
○ speeches
○ athletic events
BIBFRAME Ontology● List view
○ entire vocab on a single page
○ lists classes and properties
○ http://id.loc.gov/ontologies/bibframe.html
● Category view
○ all properties sorted into several broad categories such as identifiers, relationships, etc.
○ http://id.loc.gov/ontologies/bibframe-category.html
● RDF
○ full OWL ontology
○ http://id.loc.gov/ontologies/bibframe.rdf
Exercise: Triples in BIBFRAMEYou should each have a sheet with a URI or literal. Based on the 2 MARC records
shown on screen, work with others in the room to complete triples present in the
related BIBFRAME data.
● Green yarn - Relationships between Work, Instance, Item
● Blue yarn - Relationships from WI to agents
● Yellow yarn - anything else (eg Instanct to bf:provisionActivityStatement
Use the BIBFRAME vocabulary, or your presenters for guidance
Scissors, yarn, and tape at the front
Two Leaves of Grass000 00757cam a2200217 i 4500
001 350362A
005 20060912145412.0
008 750829r19751856paua 000 0 eng
035 __ |9 (DLC) 75029108
906 __ |a 7 |b cbc |c orignew |d 1 |e ocip |f 19 |g y-gencatlg
010 __ |a 75029108
020 __ |a 0841494452 |b lib. bdg. : |c $75.00
040 __ |a DLC |c DLC |d DLC
050 00 |a PS3201 |b 1975
082 00 |a 811/.3
100 1_ |a Whitman, Walt, |d 1819-1892.
245 10 |a Leaves of grass / |c by Walt Whitman ; with an
introd. by Gay Wilson Allen.
260 __ |a Folcroft, Pa. : |b Folcroft Library Editions, |c 1975.
300 __ |a xxi, 384 p. : |b ill. ; |c 23 cm.
500 __ |a Reprint of the 1856 ed. published by Brooklyn, N.Y.
991 __ |b c-GenColl |h PS3201 1975 |p 00020083785 |t Copy 1
|w BOOKS
000 01213cam a2200313 i 4500
001 350362B
005 20150414141922.0
008 760821t19761959nyuc 000 0 eng
906 __|a 7 |b cbc |c orignew |d 2 |e ncip |f 19 |g y-gencatlg
010 __|a 76371718
020 __|a 0140421998 : |c $1.95
035 __|9 (DLC) 76371718
040 __|a DLC |c DLC |d DLC
050 00|a PS3201 |b 1976b
082 00|a 811/.3
100 1_|a Whitman, Walt, |d 1819-1892.
240 10|a Leaves of grass
245 10|a Walt Whitman's Leaves of grass / |c edited with an introd.
by Malcolm Cowley.
250 __|a 1st (1855) ed.
260 __|a New York : |b Penguin Books, |c 1976, c1959.
300 __|a xxxvii, 145 p. : |b port. ; |c 20 cm.
490 0_|a The Penguin poets
700 1_|a Cowley, Malcolm, |d 1898-1989.
740 0_|a Leaves of grass.
856 42|3 Contributor biographical information |u
http://www.loc.gov/catdir/enhancements/fy1206/76371718-b.html
856 42|3 Publisher description |u
http://www.loc.gov/catdir/enhancements/fy1206/76371718-d.html
856 41|3 Sample text |u
http://www.loc.gov/catdir/enhancements/fy1504/76371718-s.html
991 __|b c-GenColl |h PS3201 |i 1976b |t Copy 1 |w BOOKS
Relating this to current practice: NEOS Standards and BIBFRAME1. Overview of key cataloguing standards in relation to BIBFRAME
a. BIBCO to BIBFRAME
b. CONSER to BIBFRAME
c. Review of LC conversion specifications
2. Overview of how MARC is converted to BIBFRAME (10)
a. LC comparison tool
b. Example/walkthrough of UAL-LDE
<bf:Work rdf:about="http://worldcat.org/entity/work/id/553452714">
. . .
<bf:Contribution>
<bf:agent>
<bf:Agent rdf:about="http://id.loc.gov/authorities/names/no2013100927">
<rdf:type rdf:resource="http://id.loc.gov/ontologies/bibframe/Person" />
<bflc:name00MatchKey>Maamouri, Mohamed</bflc:name00MatchKey>
<bflc:name00MarcKey>70010$aMaamouri, Mohamed</bflc:name00MarcKey>
<rdfs:label>Maamouri, Mohamed</rdfs:label>
<bf:identifiedBy><bf:IdentifiedBy><rdf:value rdf:about="http://viaf.org/viaf/305282781"/>
</bf:IdentifiedBy></bf:identifiedBy></bf:Agent>
</bf:agent>
<bf:role>
<bf:Role rdf:about="http://id.loc.gov/vocabulary/relators/ctb" />
</bf:role>
</bf:Contribution>
</bf:contribution>
. . .
Intermission
Current State
Status of BIBFRAME● Model development and testing ongoing (LC, LD4P, SVDE, other national and
research libraries), and work progressing on implementation
● Community groups such as PCC, CONSER, BIBCO actively engaging
● Numerous organization experimenting, testing MARC to BIBFRAME conversions
○ UAlberta testing -
https://github.com/ualbertalib/metadata/tree/master/metadata-wrangling/BIBFRAME
○ MARC/BIBFRAME comparison tool - http://id.loc.gov/tools/bibframe/compare-id/full-ttl
● LD4P2 (Linked Data for Production: Pathway to Implementation)
○ building on LD4L, LD4P to begin implementing shift to linked data for metadata creation and
sharing
○ https://wiki.duraspace.org/pages/viewpage.action?pageId=104568167
○ looking at what native BIBFRAME workflows, processes, tools could be
● Casalini SHARE-VDE
○ pilot virtual discovery environment
○ http://share-vde.org/sharevde/clusters?l=en
○ built on linked data principles and using BIBFRAME
OCLC
47
Work in National Libraries
Library of Congress (2018). MARC 21 to BIBFRAME 2.0 Conversion specifications. Retrieved from:
https://www.loc.gov/bibframe/mtbf/ 48
MyNewsDesk, National Library of Sweden (2018). KB becomes the first national library to fully transition to linked data. Retreived from:
http://www.mynewsdesk.com/se/kungliga_biblioteket/pressreleases/kb-becomes-the-first-national-library-to-fully-transition-to-linked-data-2573
975
49
Putting this Work in ContextIn her article on the development of BIBFRAME, McCallum (2017) reflects on the current state of
the transition of standards in libraries:
“In the 1960’s and 1970’s the AACR cataloguing rules and MARC format for bibliographic data were
developed. Forty years later we are in the transition to new cataloguing rules and also a new carrier
environment, with RDA and BIBFRAME.” (p. 84)
How far away is this transition for libraries? We can look to the work of the library community:
● LC and other National Libraries
● PCC
● LD4
● SHARE VDE
McCallum, S. s. (2017). BIBFRAME Development. JLIS.It, Italian Journal Of Library,
Archives & Information Science, 8(3), 71-85.
50
Linked Data for Production (LD4P)For the past two years, Linked Data for Production has been focusing on:
● developing standards, guidelines, and infrastructure to communally produce
metadata as linked open data
● developing end-to-end workflows to create linked open data in a technical services
production environment
● extending the BIBFRAME ontology to describe library resources in specialized
domains and formats
● engaging the broader library community to ensure a sustainable and extensible
environment
51
LD4P Phase 2 and the LD4P Cohort
A collaborative project among four institutions (Cornell, Harvard, Stanford, and the University of Iowa) and the
Program for Cooperative Cataloging (PCC), this phase of LD4P will have seven broad goals:
1. The creation of a continuously fed pool of linked data expressed in BIBFRAME-based application profiles.
2. The development of an expanded cohort of libraries (the LD4P Cohort) capable of the creation and reuse
of linked data through the creation of a cloud-based sandbox editing environment.
3. The development of policies, techniques and workflows for the automated enhancement of MARC data
with identifiers to make its conversion to linked data as clean as possible.
4. The development of policies, techniques, and workflows for the creation and reuse of linked data and its
supporting identifiers as libraries’ core metadata.
5. Better integration of library metadata and identifiers with the Web through collaboration with Wikidata.
6. The enhancement of a widely-adopted library discovery environment (Blacklight) with linked-data based
discovery techniques.
7. The orchestration of continued community collaboration through the development of an organizational
framework called LD4.
52
UAL LD4P Cohort Project Summary1. Enhancement of conversion, reconciliation and enrichment processes for MARC to BIBFRAME
2. Exploration of new forms of authority control based on URIs - Utilizing MARC and BIBFRAME data
enriched with URIs
3. Conversion of Monographs Team Operations - In order to make optimal use of current staffing and the
current level of development of BIBFRAME, we plan to work on original creation of data in the shared
RDF pool for Monographs. Thinking about this as a starting point for fuller implementation (across other
teams) we aim to convert the operations/workflows of our Monographs Team
4. Community building:
a. To help foster a wider community of linked data experimentation and implementation in Canada,
UAL will work with other Canadian participants to liaise with the cataloguing community and
standards organizations in Canada (CFLA, CCC, CCM, CLDI)
b. As a member of the NEOS consortium, which includes a shared catalogue and services related to
cataloguing, UAL will engage NEOS members in aspects of this work to transition towards linked
data, so that we can move forward together.
SHARE-VDE is a community-driven initiative to implement linked
data. While the aim is a more general focus on transitioning traditional
GLAM institution data thus far the project focus has been on moving
from MARC to BIBFRAME.
The process enriches library data with additional information and
relationships, previously unexpressed with MARC, and converts
bibliographic and authority data in linked data.
A virtual discovery platform with a four-layered adaptation of the
BIBFRAME data model was developed to provide a linked data
discovery option.
SHARE Virtual Discovery Environment (SVDE)
The main areas of the SHARE-VDE project:● Enrichment of MARC record with URIs
● Conversion from MARC to RDF using the BIBFRAME vocabulary (and other additional ontologies as needed)
● Data publication according to the BIBFRAME data model
● Batch/automated data updating procedures
● Batch/automated data dissemination to libraries
● Progressive implementation of further use cases in the priority order defined by the community
SHARE-VDE is a collaborative endeavour, based on the requirements and perceptions of libraries, developed by:
- Casalini Libri, provider of bibliographic and authority data as member of the Program for Cooperative Cataloguing - @Cult, provider of ILS, Discovery tools and Semantic web solutions for the cultural heritage sector
- with input and active participation from an international group of 22 Research Libraries and influenced by the vision of the LD4P initiative
The collaborative initiative is steered by the library community
Casalini SHARE VDE (SVDE) Project: Vendor Supported and Community Driven Development
Involvement in phase 1 and/or 2 has included:
● Stanford University
● University California Berkeley
● Yale University
● Library of Congress
● University of Chicago
● University of Michigan Ann Arbor
● Harvard University
● Massachusetts Institute of Technology
● Duke University
● Cornell University
● Columbia University
● University of Pennsylvania
● Pennsylvania State University
● Texas A&M University,
● University of Alberta
● University of Toronto
Work thus far has culminated in the creation of an
experimental linked data discovery environment as
well as the return of the NEOS catalogue in MARC
enriched with URI and BIBFRAME.
Phase 3a will see the implementation of the SVDE
platform with the full UAL/NEOS catalogue with
ongoing updates. This will allow us to continue
with data experimentation and analysis, provide a
training tool to familiarize ourselves with this kind
of data/work, and continue progression towards
linked data implementation.
57
Participating Institutions
SVDE Full Members
Duke University
New York University
Stanford University
University of Alberta – NEOS consortium
University of Chicago
University of Michigan at Ann Arbor
University of Pennsylvania
Yale University
National Libraries
Library of Congress
National Library of Medicine
National Library of Norway
LD4P Cohort
Cornell University
Frick Art Reference Library
Harry Ransom Center
Harvard University
Northwestern University
Princeton University
UC Davis
UC San Diego
University Colorado at Boulder
University of Minnesota
University of Texas A&M
University of Washington
SVDE Transformation Council
«The SHARE-VDE Transformation Council's role is to provide insight and analysis of the MARC to BIBFRAME transformation to make recommendations for improvements based on member library data analysis, and project documentation. Initial recommendations are based on Phase 2 deliverables, but the work of the team will be ongoing into the foreseeable future.»
There are 4 sub-committees focusing on specific areas:• Work Identification Working Group• Authority/Identifier Management Services Working Group• Cluster Knowledge Base Interaction/Editor Working Group• User experience/User Interface Working Group
SHARE-VDE Process Overview
60
61
Casalini bf:2.0 Data
62
The Super Work Entity Model
Possemato, T. (2019). “Share Virtual Discovery Environment in Linked Data (SHARE-VDE) Highlight on Data Modeling” 2019 LD4 Conference, Boston, MA
UAL Participation and Next StepsA vendor supported, community driven project, the SVDE transformation tool and
enrichment service will be used by SVDE members and all LD4P2/LD4P Cohort
members for consistency.
UAL has been active in the development of the project through participation in
steering meetings, analysis of conversion processes, and now work on the SVDE
Transformation Council and Work Identifier Working Group.
How does NEOS fit?
64
Sinopia Overview
Sinopia Exercise … not yetLookups and profiles in place, but currently in a transitional state with a new version
and need to set-up local profiles.
Timelines for a public launch are tabled to be discussed at ALA Annual in Washington.
While LC is providing training for the LD4P Cohort, once this is complete and Sinopia
launched the PCC will be working with Cohort members to broaden the reach of
Sinopia. Key takeaways:
1. The software is open
2. Sinopia will be made available for wider testing
3. PCC will be working on a wider training strategy
4. Learn more!
Discovery Overview1. Blacklight, the LD4P2 Cohort, and the UAL Discovery Review
2. SHARE VDE
SHARE VDE Portal Exercise
Implications for NEOS: Discussion1. What does this mean for the shared database?
2. What work is ahead for NEOS-Tech? How do we ensure our standards are
mapped to this environment?
3. What are the timelines?
a. When will BIBFRAME be here
b. How long can we continue using MARC
4. Support for discovery
5. Training
“Never let the future disturb you. You will meet it, if you
have to, with the same weapons of reason which today arm
you against the present.”
Marcus Aurelius, Emperor of Rome, 121-180. (2002). Meditations. London: The Folio Society.
80