Breaking the catalog

Peter Brantley Dallas Internet Archive Texas The Presidio 01.2012

I have a book.

It’s really a database in a book.

Doesn’t exist on the web.

The catalog entry is not useful.

It does not even give you a hint of the awesomeness of it.

All bibliographic data underperforms in this way, no matter how we describe it.

And it can’t do much for discovery.

Discovery is a lot more than a better index of metadata.

Discovery is metadata, contextualized by user desire.

Which means:

what’s relevant to me, right now, right here.

One of linked data’s challenges is contributing to discovery.

Consider: Small Demons

(smalldemons.com)

Literature through freebase with zemanta entity extraction and matching

Very nice enhanced browse capacity for ebook discovery.

APIs could engender range of new services.

But … data for recommendations is limited to known attributes and UGC.

Cool when it works; will work better with more aggregation.

Lesson: Information format is often divorced from its utility

and even more importantly …

most open culture search is absolutely ignorant of the context of my desires.

remember “Lincoln”

at the time of writing, there’s one newly released and top-‐selling book.

chances are, at time of writing, it’s that book that I want.

Amazon can figure that out.

Because they are selling a shitload of them.

Simple: increase relevancy by incorporating bias toward most recent retrievals.

Easy for Amazon: they have sales data.

Library (ebook) circulation is increasingly meaningless, or more accurately, unavailable.

The book is online. But digitally off-‐site.

Optimizing discovery is hard.

Segue: Consider relationship modeling.

Mozart’s Don Giovanni and José Zorrilla’s Don Juan Tenorio

via Tirso de Molina’s El burlador de Sevilla

per the Library Loon ...

“relationship modeling only need be done once”

which in real world terms means centralizing this modeling

duplicating the best of Flickr etc. – for a LOD repository

crowd sourced resource modeling

Enables interesting approaches to book recommending, browsing algorithms

Linked data makes for nice CS experiments and gets digital librarians excited.

No one thinks linked data is a panacea.

It’s a tool that can help in some contexts.

Yet not so much in others.

I will argue …

The most compelling uses of LD in repositories may be intra-‐catalog.

Thinking of the catalog is a database, like Amazon’s.

If I just want bib info (metadata), go yonder to OCLC or Open Library.

If I want to find out what to watch or read, I want to go to the largest aggregation of user+meta data as possible.

Might be Amazon. (Or could be DPLA … ).

Library LOD has to be network scale, on a single platform, to be end-‐user attractive (like Amazon).

I think that’s kinda funny conundrum.

Because in a way, linked open data is about a web of open data.

However, unless you are in the business of providing open data there’s more utility in …

structured data on a restricted platform – linked closed data (so to speak)

From a business perspective, I’d be a real fan of linked closed data.

If I offered cloud data services, I’d be happy to host any useful linked open data.

(Because being too open to ingest, too polygamous, can poison data stores)

As long as I (a platform) could retain an unrestricted copy of your data.

There’s a (copyleft) rights issue here too … (e.g. CC-‐SA and derivatives)

LOD domains assume unbounded sharing

But rights might be quite granular or restricted downstream

Europeana requires downstream commercial rights to encourage new enterprise

But LAMS might not possess those rights, restricting the size of the data market.

If we want linked open data to work well

We need to aggregate and hold data on a single network platform to the greatest possible extent.

Because that will drive use, and obtain intentionality information.

And that data will help ultimately to contextualize metadata with desire.

Therefore from the user perspective …

I’d like to see us build out a common open platform for LOD.

The most powerful opportunity for LOD may be in building central repositories.

peter brantley

director, bookserver project internet archive san francisco ca

@naypinya (twitter)

Documents

Breaking the catalog