50
1/ Is Linked Data something for me? Christophe Guéret, Clément Levallois eHumanities group meeting, November 22, 2012

Is linked data something for me?

Embed Size (px)

DESCRIPTION

Slides prepared with Clement Levallois for the tutorial held at the Meertens institute. The presentation goes over the need for using Linked Data to make data machine readable. The hands-on part is focused on the annotation of a profile page with RDFa.

Citation preview

Page 1: Is linked data something for me?

1/

Is Linked Data something for me?

Christophe Guéret, Clément Levallois eHumanities group meeting, November 22, 2012

Page 2: Is linked data something for me?

2/

Get ready !

Goal of today Learn about Linked Data

See if that is something interesting for your activities

Page 3: Is linked data something for me?

3/

Hands-on tutorial

Make groups, one per table Pick a famous person of your choice per group Grab the material on http://bit.ly/ehg_tutorial or

catch a USB stick

Page 4: Is linked data something for me?

4/

Big data, but how to get it?

Can't always gather all the information manually

Page 5: Is linked data something for me?

5/

Big data, but how to get it?

Data scattered in different information systems

Page 6: Is linked data something for me?

6/

Big data, but how to get it?

Data in different formats

Page 7: Is linked data something for me?

7/

What if we could?

If all data where “readable”, connections between datasets could be made. We would simply know more than we do today.

“Linked data” is an attempt to do that

Page 8: Is linked data something for me?

8/

Why is it so hard?

Machines can not read the text and extract data

What is the name of that person?

Page 9: Is linked data something for me?

9/

Ouch!

You just faced the same problem as machines: Can't read the document and extract the data

Linked Data is a solution to this problem

Note: in the following we take the example of data “buried” in webpages (html documents), but the same logic applies to other kinds of docs (csv files, databases, your collection of pictures…)

Page 10: Is linked data something for me?

10/

Use case for the hands-on

Page 11: Is linked data something for me?

11/

What we will do...

Take a the webpage of a researcher (one page per group!)

Explain why the data in this page is “buried”

Solve the issue by introducing some linked data sweetness in the webpage

Show what we gained: now, we can connect the researchers!

Page 12: Is linked data something for me?

12/

Template 1

The name is in the title City is ambiguous

Page 13: Is linked data something for me?

13/

Template 2

The name is not visible on the page City is ambiguous

Page 14: Is linked data something for me?

14/

Template 3

The name is in the description City is ambiguous

Page 15: Is linked data something for me?

15/

Hands-on: check out the templates Open the templates in a web browser and look at

their HTML source code

Page 16: Is linked data something for me?

16/

Hands-on: check out the templates Change “William Smith” into a name of your own

(one name per group)

Change and pick another name!

Page 17: Is linked data something for me?

17/

First part of the hands-on

Page 18: Is linked data something for me?

18/

In what sense do we mean that the name of this researcher is buried in this web page?

There is no way for a software reading this page to guess: is there a name on this page? if so, what is this name? What does this name represent? What does it relate to?

But wait, my Internet browser can read html pages, why can’t it figure out the name of the researcher?

Because the html code gives info about how to display the page, but no info about what the content means!

Page 19: Is linked data something for me?

19/

Two roads from there…

We could design a software that understands English This is the approach of natural language processing,

statistics, etc...

We can put extra code that tells directly to the software

what the data means This is the linked data approach! This extra code in html

pages is called “RDFa”

Page 20: Is linked data something for me?

20/

Annotate the data

We use a VOCABULARY for these annotations

foaf:name

Page 21: Is linked data something for me?

21/

Wait! What is that “foaf:name” ?

It is a term from a vocabulary foaf:name comes from the vocabulary FOAF and is used

to annotate the name of a person

Vocabulary = set of unambiguous consensual

terms used to annotate pages with data

Vocabulary are An agreement between data publisher and consumers Generally focused on particular topics

Key concept!!!

Page 22: Is linked data something for me?

22/

Annotate the page with the data

Page 23: Is linked data something for me?

23/

Hands-on: annotate with foaf:name

Add the “foaf:name” annotation to the three templates

Step 1: declare the vocabulary FOAF <html xmlns:foaf="http://xmlns.com/foaf/0.1/">

Step 2: annotate the data <span property="foaf:name">William Smith</span> Template 2 does not display the name we use a meta: <meta property="foaf:name" content="William Smith"/>

Page 24: Is linked data something for me?

24/

Hands-on: extract annotations

Use the RDFa extractor at http://bit.ly/RDFaParser to get the annotations from the three templates

Command line tool: java -jar RDFaParser-0.0.6.jar template1.html java -jar RDFaParser-0.0.6.jar template2.html java -jar RDFaParser-0.0.6.jar template3.html

All the three return the same result: nothing!

Page 25: Is linked data something for me?

25/

Bingo!

We get exactly the same result for the three templates

foaf:name = William Smith

Page 26: Is linked data something for me?

26/

How this should look like now

(here showing template 1)

Page 27: Is linked data something for me?

27/

How to choose a vocabulary?

Vocabulary => consensus

Therefore, it is better to Avoid obscure vocabularies nobody knows Focus on well organised and maintained vocabularies

Why did we use FOAF? Specialised for personal profiles and widely accepted W3C support & recommended for use by EU members

http://joinup.ec.europa.eu/asset/core_person/description

Page 28: Is linked data something for me?

28/

What vocabularies are available?

Many are well established: FOAF, SIOC, Dublin Core, BIBO, …

Creating vocabularies is doable but beware that: New vocabularies won't necessarily gain adoption Need to maintain the vocabulary Need to host it on the Web

A vocabulary can borrow terms from other vocabs.

Page 29: Is linked data something for me?

29/

EU initiative

“Core Vocabularies” from ISA program Combine existing terms and new ones

Page 30: Is linked data something for me?

30/

Google/Bing/Yahoo/Yandex initiative

Vocabulary: Schema.org Used by search engines to extract pages' data

Page 31: Is linked data something for me?

31/

Facebook initiative

Vocabulary: Open graph protocol Used to put the “Like” buttons on pages

Page 32: Is linked data something for me?

32/

How to use a vocabulary?

Look at the documentation, e.g. http://xmlns.com/foaf/spec/

Map your concepts to terms from the vocabulary Naam → foaf:name Voornaam → foaf:firstName Achternaam → foaf:lastName Werklocatie → foaf:based_near

Page 33: Is linked data something for me?

33/

Triples and subjects

Remember, we created this annotation . foaf:name "William Smith“

But what entity has “William Smith” for a name? <template1.html> foaf:name "William Smith"

This is a “triple” made of a subject, a predicate and an object Subject = <template1.html>

Predicate = foaf:name

Object = "William Smith"

Meaning: This document has for name “William Smith”

Page 34: Is linked data something for me?

34/

We did not declare a subject

This says that this is the foaf:name but does not define a subject → Use the page name by default

foaf:name

Page 35: Is linked data something for me?

35/

Why does this matter?

Subjects can be used as objects to create links

Need a common subject to group annotations

Durham foaf:based_near

William smith foaf:name

foaf:knows foaf:name

Page 36: Is linked data something for me?

36/

Picking a resource

Need to be stable, web accessible, re-used

Consensus again, example: Amsterdam: http://dbpedia.org/resource/Amsterdam TBL: http://www.w3.org/People/Berners-Lee/card#i

The <C:/MyDirectory/templateX.html> are not valid

Web based, we need to change that

Page 37: Is linked data something for me?

37/

Hands-on: set the subject

Step 1: decide on a resource for the person http://example.org/william_smith http://myurl.com/john_doe

Step 2: add the resource with an “about” tag in the same span as the foaf:name

Example: You had: <span property="foaf:name">

It becomes:

<span about="http://example.org/william_smith_page" property="foaf:name">

Page 38: Is linked data something for me?

38/

5-star Linked Data

Rules (see http://5stardata.info/ ): Resource are valid URIs Machine readable data is associated to the resource The data contains links to other resources

Example http://dbpedia.org/resource/Amsterdam

Page 39: Is linked data something for me?

39/

Great! We're done now!

We added this structured piece of data to all the templates:

<http://example.org/william_smith> foaf:name "William Smith"

This data can be extracted by a software

We can build our application that fetch persons'

name, but there are still no links between them :-/

Page 40: Is linked data something for me?

40/

One of the new code

All the annotated templates have their name suffixed with “_with_name_and_subject”

Page 41: Is linked data something for me?

41/

Second part of the hands-on

Create some links

Page 42: Is linked data something for me?

42/

Creating links

Links are used to connect two resources

Example: William Smith knows Tim Berners-Lee <http://example.org/william_smith> foaf:knows

<http://www.w3.org/People/Berners-Lee/card#i>

Two usages: Create (social) networks by connecting resources Disambiguate text by pointing to the exact resource

Page 43: Is linked data something for me?

43/

Hands-on: getting social Step 1: ask 3 other groups in this workshop for their subject (remember, a subject is: <span about="http://example.org/william_smith_page" property="foaf:name">

Step 2: use the 3 subjects you got to annotate the links Example:

I know

<span rel="foaf:knows" resource="http://example.org/john_doe">John Doe</span>

, and

<span rel="foaf:knows" resource="http://myUrl.com/nchomsky">Noam Chomsky</span>

, and also

<span rel="foaf:knows" resource="http://ehumanities.knaw.nl/sally_wyatt">Sally Wyatt</span>

Page 44: Is linked data something for me?

44/

Let's make some links

Page 45: Is linked data something for me?

45/

Remember, there are two Durham

One of the US, one in the UK, similar importance Which one is the “Durham” on the profile?

http://sws.geonames.org/4464368 http://sws.geonames.org/2650628

Page 46: Is linked data something for me?

46/

Finding a resource on Geonames

Search by name, follow the RDF link, strip out the “/about.rdf” part

Page 47: Is linked data something for me?

47/

Hands-on: disambiguate Durham

Annotate “Durham” with a link to the exact resource

Step 1: decide on which Durham to use

Step 2: annotate Durham with the link <span rel="foaf:based_near"

about="http://example.org/william_smith" resource="http://sws.geonames.org/4464368">Durham</span>

Page 48: Is linked data something for me?

48/

Hands-on: extract annotations

Use the RDFa extractor at http://bit.ly/RDFaParser to get the annotations from the three templates

Command line tool: java -jar RDFaParser-0.0.6.jar template1.html java -jar RDFaParser-0.0.6.jar template2.html java -jar RDFaParser-0.0.6.jar template3.html

All the three return the same result!

Page 49: Is linked data something for me?

49/

Hands-on: extract a network!

Now use a little software from the dropBox

Page 50: Is linked data something for me?

50/

That's all for now!

(but there is more to discover: ontologies, reasoning, SPARQL, ...)