35
1 Linked Data Workshop Stanford University June 27 – July 1, 2011

Linked Data Workshop Stanford University

Embed Size (px)

DESCRIPTION

Presented by Jerry Persons at Linked Data and Libraries 2011, London 14th July 2011

Citation preview

Page 1: Linked Data Workshop Stanford University

1

Linked Data WorkshopStanford University

June 27 – July 1, 2011

Page 2: Linked Data Workshop Stanford University

2

Linked Data WorkshopStanford University

June 27 – July 1, 2011

Page 3: Linked Data Workshop Stanford University

3

who

CLIR (Council on Library and Information Resources) Research Libraries

National Libraries

HighWire Press LOCKSS / CLOCKSS Metaweb / Freebase (Google) Research Center for Informatics, National Institute for Informatics, Japan sameAs.org, Seme4 and University of Southampton Semantic Computing Research Group (SeCo), Aalto University, Finland

• Michigan• Stanford• Virginia

• Bibliotheca Alexandrina • California• Emory

• Bibliothèque nationale de France • British Library• Deutsche Nationalbibliothek

• Kongelige Bibliotek (Denmark)• Library of Congress

Page 4: Linked Data Workshop Stanford University

4

what

The Stanford Workshop focused on crafting fund-able plans for creating tools, processes, and vehicles to expedite a disruptive paradigm shift in the work flows, data stores, and interfaces used for managing, discovering and navigating the knowledge and information resources that fuel scholarship and research.

The goal was identifying knowledge management capabilities andspecifying designs for requisite new components, mechanisms, environments, and communities that will:

Page 5: Linked Data Workshop Stanford University

5

what

The Stanford Workshop will focus on crafting fund-able plans for creating tools, processes, and vehicles to expedite a disruptive paradigm shift in the work flows, data stores, and interfaces used for managing, discovering and navigating the knowledge and information resources that fuel scholarship and research.

The goal is identifying knowledge management capabilities andspecifying designs for requisite new components, mechanisms, environments, and communities that will:

1. move beyond current metadata practices based on discrete, distributed, and replicated database records;

2. precipitate a new family of methods and tools to replace today’s metadata records with an array of emergent, open, link-driven metaservices;

Page 6: Linked Data Workshop Stanford University

6

what

3. rapidly expand the breadth, density, and reliability of well-curated identifiers and links associated with the publications, data, manuscripts, documents, artifacts, and other resources available via the services and holdings of the world’s national+research libraries, museums, archives, and other science, social science, and cultural heritage institutions; and

4. provide for continuous improvement in the quality and density of link-driven navigation and discovery capabilities through provision of open, managed feedback and annotation by individuals and communities who seek, gather, consume, and build content in the course of their reading, teaching, learning, scholarship, research, and other knowledge-based activities.

Page 7: Linked Data Workshop Stanford University

7

context

Page 8: Linked Data Workshop Stanford University

8

context

Page 9: Linked Data Workshop Stanford University

9

context

Page 10: Linked Data Workshop Stanford University

10

context

Page 11: Linked Data Workshop Stanford University

11

context

Page 12: Linked Data Workshop Stanford University

12

context

I’ve liked to characterize the current moment as a circle of libraries, museums, archives, universities, journalists, publishers, broadcasters and a number of others in the culture industries standing around, eyeing other and at the space in between them while wondering how they need to reconfigure for a world of digitally networked knowledge.

Josh Greenberg, Moving a handful of blocks north …, April, 2010.

Page 13: Linked Data Workshop Stanford University

13

context

I’ve liked to characterize the current moment as a circle of libraries, museums, archives, universities, journalists, publishers, broadcasters and a number of others in the culture industries standing around, eyeing other and at the space in between them while wondering how they need to reconfigure for a world of digitally networked knowledge.

Josh Greenberg, Moving a handful of blocks north …, April, 2010.

Whichever organizations do an excellent job of providing context and coherent linkages will be the go-to ones for data consumers. As we have seen to date, merely publishing linked data triples does not meet this test.

Mike Bergman, I have yet to metadata I didn’t like, 2010

Page 14: Linked Data Workshop Stanford University

14

context

The biggest problem we face right now is a way to ‘link’ information that comes from different sources that can scale to hundreds of millions of statements (and hundreds of thousands of equivalences). Equivalences and subclasses are the only things that we have ever needed of OWL and RDFS, we want to ‘connect’ dots that otherwise would be unconnected.

Stefano Mazzocchi, Darkness is relative, I guess,  January, 2007.

Page 15: Linked Data Workshop Stanford University

15

context

The biggest problem we face right now is a way to ‘link’ information that comes from different sources that can scale to hundreds of millions of statements (and hundreds of thousands of equivalences). Equivalences and subclasses are the only things that we have ever needed of OWL and RDFS, we want to ‘connect’ dots that otherwise would be unconnected.

Stefano Mazzocchi, Darkness is relative, I guess,  January, 2007.

commentfor every one of these questions,  I know multiple librarians who would know the answers off the top of their heads

rejoindercan I have copies of those librarians?

anonymized from the IRC back channel at a Code4Lib meeting

Page 16: Linked Data Workshop Stanford University

16

issues ... snapshot at mid-point of workshop

1. co-referencing, reconciliation – across formats, disciplines ... 2. use of extant, well curated metdata – including authority files, ... 3. killer apps – via GLAM communities? ... emergent via web? 4. provenance – attribution / origin / authority 5. staff training; creating, deriving, publishing URIs, making

links, using links in discovery environments 6. usability of data -- “reifiable” 7. QC – immediate and over time – across language boundaries 8. standards for URIs – versioning 9. data curation – i.e. linked data and its various components10. distribution of responsibility – e.g. preserve metadata, content11. feedback, reporting, reward systems, metrics, contribute

linkable data (filling gaps), contribute URIs (SEO issues)

Page 17: Linked Data Workshop Stanford University

17

issues

12.marketing / outreach – user seduction & training13.workflow14.scalability [an indicator of success, fixes exist]15. indexing – how to get data once you have the link16.use of ontologies17. licensing – focused on metadata at this juncture, content later18.annotation – linked data extended / improved by its consumers19. relationship to e-scholarship (esp. e-science) & e-learning20.cultural diversity (languages, character sets) – existing

schema adequate?21.search engine optimization22.social networking (FaceBook, Google+, ...)

Page 18: Linked Data Workshop Stanford University

18

extant metadata

reconcile

+ newly minted

vectors: 1. workflow / pipeline

transcode

reconcile

reconcile

revise

publishedcanon

WWWfabric of

linked datavia

algorithm

killerapp(s)

via people

Page 19: Linked Data Workshop Stanford University

19

+ newly minted

vectors: 1. workflow / pipeline

WWWfabric of

linked data

viaalgorithm

reconcile

reconcilerevise

publishedcanon

killerapp(s)

via people

Page 20: Linked Data Workshop Stanford University

20

vectors: 2. projects issues

Bring issues to bear in project plans for a real-life project

1. Use cases [3. killer apps]a. put yourself in role of linked-data developer and/or consumer

- what’s needed, what will foster new/better capabilitiesb. what are relationships between this and other data

- what vocabularies, schema, URIs, and models are in playc. components (the test case is journals) [2. extant authorities]

- names, journal & article titles, date ranges, citations, publishers, ISSN, language, topics/classification

d. effect of proposed project [19. relationship to e-scholarship, etc.]

2. Output data representation / modela. [17. licensing] for the metadatab. schema / vocabulary selection

- [8. standards for URIs]- [6. usability] and [20. cultural / language issues ]

3. Production [13. workflows]a. [5. staff training ...]b. [1. co-referencing & reconciliation]c. massive conversion from strings to URIs typical w/ extant data

Page 21: Linked Data Workshop Stanford University

21

vectors: 2. projects issues

4. Maintenance- production systems vs. new mgmt requirements for linked data- where are updates & revisions applied?- [9. Data curation] and [7. QC, immediate & over time]- [10l shared responsibilities, e.g. metadata preservation]

5. Distribtution - [12. marketing/outreach, user seduction]- [14. scalability]- [21. SEO]- [22. social networking (FaceBook, Google+, etc)]- [15. indexing] and [18. annotation]

Page 22: Linked Data Workshop Stanford University

22

vectors: 2. projects issues

4. Maintenance- production systems vs. new mgmt requirements for linked data- where are updates & revisions applied?- [9. Data curation] [QC, immediate & over time]- [shared responsibilities, e.g. metadata preservation]

5. Distribtution - [12. marketing/outreach, user seduction]- [14. scalability]- [21. SEO]- [22. social networking (FaceBook, Google+, etc)- [15. indexing] [18. annotation]

6. Metrics [11. feedback, reporting, reward systems, ...]

Value added

linked-data consumers

Page 23: Linked Data Workshop Stanford University

23

vectors: 2. projects issues

4. Maintenance- production systems vs. new mgmt requirements for linked data- where are updates & revisions applied?- [9. Data curation] [QC, immediate & over time]- [shared responsibilities, e.g. metadata preservation]

5. Distribtution - [12. marketing/outreach, user seduction]- [14. scalability]- [21. SEO]- [22. social networking (FaceBook, Google+, etc)- [15. indexing] [18. annotation]

6. Metrics [11. feedback, reporting, reward systems, ...]

Value added Value accrued

linked-data consumers

metadata producers

Page 24: Linked Data Workshop Stanford University

24

vectors: 3. cookbook issues

value statementsuse cases

ingestionof data

confidence of data,provenance

publishingdata

providing / engenderingservices

education / outreachuser seduction

Page 25: Linked Data Workshop Stanford University

25

vectors: 3. cookbook issues

maturity

novice journeyman master

value statementsuse cases

ingestionof data

confidence of data,provenance

publishingdata

providing / engenderingservices

education / outreachuser seduction

Page 26: Linked Data Workshop Stanford University

26

vectors: 3. cookbook issues

referenceimplementations

maturity

novice journeyman master

value statementsuse cases

ingestionof data

confidence of data,provenance

publishingdata

providing / engenderingservices

education / outreachuser seduction

Page 27: Linked Data Workshop Stanford University

27

elephants in the room

URIs, not strings• must not underestimate the amount of effort required to transform large subsets of GLAM metadata from flat records into linked data replete with URIs

reconciliation provenance• need plans for mgmt of co-references emerging from large swaths of newly minted GLAM linked data, e.g.

-- norms / vehicles for provenance that track and record reconciliation events, agents, criteria, etc.-- means to track negative co-reference decisions

feedback, reporting, reward systems, metrics• need persuasive justifications for building and supporting linked-data systems for the cultural heritage community

Page 28: Linked Data Workshop Stanford University

28http://blog.okfn.org/2011/06/24/notes-from-open-metadata-workshop-hague-15th-june-2011/

Notes from Open Metadata Workshop [Europeana] The Hague, 15th June 2011

Posted on June 24, 2011 by Jonathan Gray e.g.

Page 29: Linked Data Workshop Stanford University

29

Page 30: Linked Data Workshop Stanford University

30

caveats

mgmt of co-references needs to be a bottom-up process

• funders will pressure to impose standards• risk is that top-down approach will capsize the effort• need to let things grow organically

Page 31: Linked Data Workshop Stanford University

31

caveats

mgmt of co-references needs to be a bottom-up process

• funders will pressure to impose standards• risk is that top-down approach will capsize the effort• need to let things grow organically

build systems that accept the way the world is, not what you would like it to be

Page 32: Linked Data Workshop Stanford University

32

caveats

mgmt of co-references needs to be a bottom-up process

• funders will pressure to impose standards• risk is that top-down approach will capsize the effort• need to let things grow organically

build systems that accept the way the world is, not what you would like it to be

focus on changing current practices (in the long run),

not only on reconciling data (in the short run)

Page 33: Linked Data Workshop Stanford University

33

caveats

mgmt of co-references needs to be a bottom-up process

• funders will pressure to impose standards• risk is that top-down approach will capsize the effort• need to let things grow organically

build systems that accept the way the world is, not what you would like it to be

focus on changing current practices (in the long run),

not only on reconciling data (in the short run)

preventing problems is better than solving them

Page 34: Linked Data Workshop Stanford University

34

stay tuned

CLIR linked-data survey

Workshop documents• introductory presentations• agendas as they evolved• reports from the work groups• summaries

Proposals for work• specific projects• communities of practice• opportunities to collaborate & contribute

Page 35: Linked Data Workshop Stanford University

35

questions / thoughts ?

Thank you for your time and attention

Jerry [email protected]