40
MARC and BIBFRAME Linking libraries and archives LIS 551 Dorothea Salo

MARC and BIBFRAME; Linking libraries and archives

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: MARC and BIBFRAME; Linking libraries and archives

MARC and BIBFRAME Linking libraries and archives

LIS 551Dorothea Salo

Page 2: MARC and BIBFRAME; Linking libraries and archives

We built MARC when

stood between us and patron.Photo: Deborah Fitchett, “Catalogue cards” http://www.flickr.com/photos/deborahfitchett/2970373235/ CC-BY

Page 3: MARC and BIBFRAME; Linking libraries and archives

We built MARC when

the world was clearly bounded.Photo: NASA Goddard Photo and Video, “NASA Blue Marble” http://www.flickr.com/photos/gsfc/4392965590/ CC-BY

Page 4: MARC and BIBFRAME; Linking libraries and archives

These days,

stands between us and patron.Photo: Declan Jewell, “My Desk” http://www.flickr.com/photos/declanjewell/2743737312 CC-BY

Page 5: MARC and BIBFRAME; Linking libraries and archives

These days,

world’s looking a bit fractal!Photo: NASA Goddard Photo and Video, “Still centered over the Atlantic” http://www.flickr.com/photos/gsfc/4409800816/ CC-BY

Page 6: MARC and BIBFRAME; Linking libraries and archives

From what you read...

•What difficulties are programmers finding in the MARC/AACR2/ISBD(G) way of doing things?•Especially keep in mind what the programmers are trying to accomplish!

•What solutions are they recommending?•(yes, I know a lot of what I had you read is pure kvetching)

•How do humans differ from computers? What does that mean for cataloging?

Page 7: MARC and BIBFRAME; Linking libraries and archives

Problems with MARC/AACR2/ISBD (if you’re a networked computer)

•Globally-unique identifiers for what’s in our bibliographic universe?•And what IS in our bibliographic universe, anyway?

•Interoperability? Who speaks MARC outside libraries?•This is a problem on both ends of the pipeline, these days!

•FREE TEXT (for anything not transcribed) MUST DIE.•It is the LEAST consistent, internationalizable, interoperable way to record

information on a computer.•Put another way: we haven’t controlled all the cataloging practices we usefully could.

http://robotlibrarian.billdueber.com/isbn-parenthetical-notes-bad-marc-data-1/

Page 8: MARC and BIBFRAME; Linking libraries and archives

Speaking of free text...•What kinds of problems did you run into as you

were working with MODS?•Case•Enumerations•Remembering end-tags!

•What does that suggest to you about humans, text, and accuracy?

•Seriously, people, PARSE AND VALIDATE YOUR XML. Just do it. •I have to, too, and I’ve been XMLing for a decade and a half, almost.

•Linked-data rallying cry: “Things not strings!”•And now you begin to understand why.•(Known psychological phenomenon: humans RECOGNIZE text far more accurately than

they can PRODUCE or REPRODUCE it. Wouldn’t it be nice if our cataloging and metadata tools respected that?)

Page 9: MARC and BIBFRAME; Linking libraries and archives

Problems•The MARC format is old, obscure, and difficult

to parse.•There are programming libraries in many popular languages that read MARC. If you

ever have to work with MARC records programmatically, use them!•Despite those, MARC’s obscurity isolates library data from other information

communities, especially on the Web.

•Free-text fields/parts in MARC are uncontrolled, and often inconsistent in practice. Fixed fields don’t contain all the info they perhaps should.

•Use of strings (text) rather than identifiers.•Review: What makes a good identifier for a computer?

•Errors. Errors! ERRORS!!!!!

Page 10: MARC and BIBFRAME; Linking libraries and archives

More problems

•Some MARC tags are ambiguous.•What kind of URL is in an 856?

•Sometimes the same information (from a computer’s point of view) is scattered across a bunch of MARC fields.

•Some MARC tags relate to other MARC tags, but are not explicitly correlated in any way.

•Very hard to figure out a work’s carrier (content type).•Want to give your patrons a list of DVDs? You are OUT OF LUCK.

Page 11: MARC and BIBFRAME; Linking libraries and archives

Fundamental problems•“Machine readable” vs. “machine actionable.”

•This problem breaks down into, um, where data get broken down. With MARC, that happens after record creation, and isn’t reliable. With linked data, it happens up-front.

•Every minute a programmer spends coping with non-computer-friendly practices is a minute NOT spent enhancing library experiences for patrons.

•Every minute a cataloger spends creating computer-unfriendly data is a minute wasted.

•We are cataloging for computers now, not just catalog-card-reading humans.•Put another way, computers are today’s intermediary between catalogers and patrons,

as catalog cards once were. We were card-friendly. We must become computer-friendly.

Page 12: MARC and BIBFRAME; Linking libraries and archives

Have you noticed?•Programmer: “Trying to solve Patron

Problem X. MARC causes Complication Y.”•Cataloger: “LALALALA MARC LALALA

TRADITION LALALALALA I CAN’T HEEEEEEEEEEAR YOOOOOOOOOU!”•wait, where did the patron and her problem go?

•Seriously, librarianship? Seriously? We have to stop this.•If programmers demonstrate it’s a problem, IT IS A PROBLEM. I don’t care about

the tradition or the standard that caused the problem. Solve the problem!

Page 13: MARC and BIBFRAME; Linking libraries and archives

The “open” in LOD

•Who owns metadata?•Who thinks they own it?•If metadata are both ownable and owned,

what does that mean for linked data?•Conversely, can linked data provide a lever against owned metadata?

•How is OCLC treating this issue?•How is DPLA treating it?

Page 14: MARC and BIBFRAME; Linking libraries and archives

SPARQL•With XML data, you generally just dump

it on the web and let people figure out what (if anything) to do with it.•This means a lot of translator-writing and bandwidth cost.•(There’s an XML query language called XQuery, but nobody uses it.)•You can do this with RDF too (and some do), but it’s not really ideal.

•SPARQL: query language for RDF.•Looks a LOT like SQL, intentionally so. The hardest thing to get to grips

with is namespace declarations, and that’s not really all that hard.•“SPARQL endpoint:” URL for a given set of RDF data that you can send

queries to and get answers from.

Page 15: MARC and BIBFRAME; Linking libraries and archives

How?

•If linked data is where the world is moving...•If data need to be open to be linked...•If libraries and archives are sitting on a mass

of unlinked and possibly unlinkable data...•How do we get there from here?

•And where’s “there” anyway?

Page 16: MARC and BIBFRAME; Linking libraries and archives

Linked Data principleshttp://www.w3.org/DesignIssues/LinkedData.html

•use URIs as names for things•use HTTP URIs (aka URLs) so that people can

look up those things•(this is one of Linked Data’s concessions to pragmatism, compared to the

original SemWebbers)

•when someone looks up a URI, provide useful information, using the standards

•include links to other URIs so that they can discover more things

Page 17: MARC and BIBFRAME; Linking libraries and archives

Review: the five stars of linked data(Tim Berners-Lee)

Page 18: MARC and BIBFRAME; Linking libraries and archives

The road ahead1.Model our universe of things in a linked-data-

friendly fashion.•This is what Coyle and Hillmann, BIBFRAME, SKOS, various national-library

efforts, VIAF, Dublin Core, EAC-CPF, and to some extent RDA are working on.•(“Things” != “just things.” For linked data, people, places, and subjects are also

things.)

2.Atomize our existing data as best we can.3.Assign URL identifiers to everything in sight.4.Publish, link out, and link up!5.(Squelch OCLC’s ownership claims. We can’t

have that if we want LOD or even just LD.)

Page 20: MARC and BIBFRAME; Linking libraries and archives

Modeling“what are the thingies in my neighborhood?”

Page 21: MARC and BIBFRAME; Linking libraries and archives

BIBFRAME•LoC got tired of all the waffling about how

to replace MARC.•10/31/2011: “We’re going to just DO THIS. Join in or don’t.”•NISO (which owns MARC), ILS vendors, catalogers: *have kittens*•LoC: “Cope.” (You can tell where my sympathies lie, yes?)

•Not the first or the only; the British Library has been working on its own LD infrastructure for a couple years now.•Spain’s national library has an interesting RDFized FRBRish

implementation.

Page 22: MARC and BIBFRAME; Linking libraries and archives

First-cut data model

•Look familiar? What’s changed?

Eric Miller, “BIBFRAME Transition Update,” http://www.slideshare.net/zepheiraorg/bibliographic-14207718

Page 23: MARC and BIBFRAME; Linking libraries and archives

First-cut data model

Eric Miller, “BIBFRAME Transition Update,” http://www.slideshare.net/zepheiraorg/bibliographic-14207718

Page 24: MARC and BIBFRAME; Linking libraries and archives

SKOS

•Simple Knowledge Organization System•from our friends at the W3C; builds on prior work•http://www.w3.org/2004/02/skos/

•Representation of common controlled-vocabulary structures in RDF syntax

•Review: What does a thesaurus entry look like?

Page 25: MARC and BIBFRAME; Linking libraries and archives

Things in SKOS•Concepts (we know them as “terms”)

•And “Concept Schemes,” which represent CVs as we’re used to thinking of them

•Labels•Review: why are these distinct from Concepts?

•Relationships among concepts•Broader/narrower•Associative (“see also”)•Equivalencies and near-equivalencies (here there be dragons)

•Notes (of various kinds)•Really pretty straightforward, for RDF!

•And has made inroads outside libraries for that very reason

Page 26: MARC and BIBFRAME; Linking libraries and archives

EAD

•Attempt to model EAD “things”•http://archiveshub.ac.uk/locah/2010/09/28/model-a-first-cut/

•Things•Unit of Description•Archival Finding Aid•Repository (an Agent)•Origination (an Agent)•“Things” (access points, index terms). Review: What are archival access points?

•Best I can tell, this work hasn’t been taken up yet.

Page 27: MARC and BIBFRAME; Linking libraries and archives

LOCAH Project, “The ‘things’ in EAD” http://archiveshub.ac.uk/locah/2010/09/28/model-a-first-cut/

Page 28: MARC and BIBFRAME; Linking libraries and archives

RDFizing RDA•What does RDA actually talk about?

•FRBR model: Group 1, 2, and 3 entities•(though Group 1 is still kind of squidgy, really, and some application

developers are questioning its usefulness)•DCMI model (because life can NEVER be simple)•Relationships among entities

•What do we want to say about them?•Are there existing ways to say these things that are good enough for our

purposes? Can we reuse them, or at least map to them?•When there aren’t, how do we say what we need to in ways that are most

useful for the rest of the world?

•Assigning URIs to it all

Page 29: MARC and BIBFRAME; Linking libraries and archives

Model friction•FRBR: entity-relationship model

•... like relational databases, which is nice•not entirely RDFish, which is not quite so nice and is causing head-scratching•But head-scratching is normal in this space! Modeling is hard!

•FRBR does give us some abstractions to model and assign URIs to.•And IFLA was supposed to do that... but they haven’t.•So the RDA folks have provisionally done it: FRBRoo.•Should IFLA get back in the game, formal equivalences will be defined and

published between FRBRoo and whatever IFLA comes up with.

•FRBR isn’t perfect. (Gasp. I know, right?)•So sticking strictly to FRBR as we model (relationships particularly) causes

problems for music and multimedia catalogers, among others.

Page 30: MARC and BIBFRAME; Linking libraries and archives

RDA Vocabularies•Hillmann, Dunsire, et al. try to work through

RDFizing RDA.•It’s crazy complicated. And weird. So I’m not

even trying to explain it here.•If you REALLY WANT TO KNOW: http://www.dlib.org/dlib/january10/

hillmann/01hillmann.html •OR Karen Coyle’s Library Technology Report “RDA Vocabularies for a

Twenty-First-Century Data Environment”

•Would a cataloger need to know this?•Probably not. It’s plumbing, really.

Page 31: MARC and BIBFRAME; Linking libraries and archives

URLizing things

Page 32: MARC and BIBFRAME; Linking libraries and archives

We’ve seen some already...

•URLized Dublin Core concepts•MODS accepting URLs•VIAF•id.loc.gov•One more I won’t talk about today:

Dewey Decimal (dewey.info)

Page 33: MARC and BIBFRAME; Linking libraries and archives

Archivists!

•Why isn’t VIAF enough for you?

Page 34: MARC and BIBFRAME; Linking libraries and archives

Science librarians!

•Why isn’t VIAF enough for you?

Page 35: MARC and BIBFRAME; Linking libraries and archives

EAC-CPF and SNAC•EAC-CPF: Encoded Archival Context—

Corporations, Persons, and Families•Archival authority standard, from the kindly folks who brought us EAD!•Not linked-data-ized. Yet.

•SNAC: The Social Networks and Archival Context Project

•Using EAC-CPF to link people across archival collections•Aha! Starting to sound more linked-data! (It’s not RDFfy, yet, but should be

relatively easy to RDFize later.)•http://socialarchive.iath.virginia.edu/

•Moral: You don’t have to use or even know RDF to start getting ready for linked data!

Page 36: MARC and BIBFRAME; Linking libraries and archives

ORCID and ISNI

•Open Researcher and Contributor ID•Authority control for scholars who don’t write books•http://orcid.org/

•International Standard Name Identifier•Thinks of itself as a superset of VIAF, ORCID, etc.•Remains to be seen whether they can pull this off, of course...•http://isni.org/

•Review: what does linked data do about two URLs identifying the same thing?

Page 37: MARC and BIBFRAME; Linking libraries and archives

Putting it together

Page 38: MARC and BIBFRAME; Linking libraries and archives

Europeana

•Review: what is Europeana?•Built a “Europeana Semantic Elements” set

•Dublin Core Application Profile•Kludgy, as anything with Dublin Core inevitably is

•Moving to “Europeana Data Model”•Publishing linked open data at data.europeana.eu•For more: http://www.niso.org/publications/isq/

2012/v24no2-3/isaac/

Page 39: MARC and BIBFRAME; Linking libraries and archives

Missouri History Museum

•Needed to create a search portal to disparate metadata silos•Gee, where have we heard THAT before?

•Decided to crosswalk to RDF.•Work in progress, but initial results encouraging.

•http://www.museumsandtheweb.com/mw2012/papers/using_an_rdf_data_pipeline_to_implement_cross_

Page 40: MARC and BIBFRAME; Linking libraries and archives

Had enough? Okay.

•Copyright 2013 by Dorothea Salo.•This lecture and slide deck are licensed

under a Creative Commons Attribution 3.0 United States License.

•Several diagrams reproduced under fair use.