24
Fun with the Semantic Web Peter Mika Yahoo! Research Barcelona [email protected]

Hack U Barcelona 2011

Embed Size (px)

DESCRIPTION

Very brief intro to Semantic Web and BOSS for a Yahoo! Hack U event at UPC in Barcelona, Spain.

Citation preview

Page 1: Hack U Barcelona 2011

Fun with the Semantic Web

Peter Mika

Yahoo! Research Barcelona

[email protected]

Page 2: Hack U Barcelona 2011

- 2 -

Vague, but exciting… Berners-Lee and the dawn of the Web

Page 3: Hack U Barcelona 2011

- 3 -

Semantic Web

• Publish data on the Web– Linked Data: linking data similar to how we link documents on the Web– Query databases over the Web

• Architectural challenges– A common format for sharing data– Sharing the meaning of data– Infrastructure

• Semantic Web standards from W3C– Data and schema languages (RDF, OWL, RIF)– Document formats (RDF/XML, RDFa)– Protocols (SPARQL, HTTP)

• Semantic Web research into knowledge representation and reasoning, data integration, data quality and many other topics

• Community effort (Linked Data movement)

Page 4: Hack U Barcelona 2011

- 4 -

RDF (Resource Description Framework)

• The basic data model of the Semantic Web– A universal model to capture all sorts of data: networks,

relational, object-oriented…

• Basic unit of information is a triple – A tuple of (subject, predicate, object)– Example: (Joe, loves, Mary)– Each triple gives the value of a property for a given resource or

relates two objects to one another• Object is either a resource or a literal

• An RDF model is a set of triples– Ordering of statements in an RDF document is irrelevant

(unlike XML)

Page 5: Hack U Barcelona 2011

- 5 -

Resources vs. literals

• Resources are identified by a URI or otherwise the are called a blank node– URIs are a generalization of URLs– Notation: <http://www.example.org/Person> or ex:Person

• Literals have an optional language and datatype (string, integer etc.)– Literals can not be subjects of statements– Datatypes are identified by URIs, e.g. XML Schema datatypes– Two literals are the same if their components are the same– Notation: “Joe B.” or Joe@en^^http://…#string

Page 6: Hack U Barcelona 2011

- 6 -

Graphical and textual notation

• A number of ways to serialize an RDF model into an RDF document– RDF/XML, Turtle, N3, N-Triples– Example: http://www.cs.vu.nl/~pmika/foaf.rdf

my:Joe

“Joe A.”

name

typefoaf:Person

Page 7: Hack U Barcelona 2011

- 7 -

RDF is designed for the Web

• URIs provide web-wide global identification across datasets– A resource may be described by multiple documents– We know it’s the same resource because the same URI is

used or through reasoning (advanced topic…)– URIs are intented to be reused– Unique, but not single identifiers: two URIs may denote the

same thing

• URIs can be retrieved from the Web– A well-behaved URI returns a description of the resource – Provides authority: the definition of foaf:Person lives at that

URI

• Ontologies can be looked up as well– Typically at the root of the URIs, also known as the namespace– Example: http://xmlns.com/foaf/0.1/Person redirects to the

specification

Page 8: Hack U Barcelona 2011

- 8 -

URIs implicitly link data together

(#joe, #name, “Joe A.”)(#joe, #email, mailto:[email protected])

(#mary, name, “Mary B.”)(#mary, gender, “female”)

(#joe, #loves, #mary)

Joe’s homepage

A dating site

Mary’s homepage

(#name, #type, #Property)(#name, #domain, #Person)

Schema doc

Page 9: Hack U Barcelona 2011

- 9 -

Put together, triples form a single ‘global’ graph

“Joe A.”

#joe

#name

[email protected]

#email

#mary

#loves

“Mary B.”

“female”

#name

#gender

Page 10: Hack U Barcelona 2011

- 10 -

Linked Data

• Open your data• Publish it in RDF, the lingua franca of the data web• Data first, schema second

– Worry about linking, data integration later… someone else can do it for you!

• Optionally, provide query access using the SPARQL query language and protocol– Powerful, SQL-like query language– HTTP or SOAP protocol to communicate with SPARQL servers

Page 11: Hack U Barcelona 2011

- 11 -

Linked Data cloud: interlinked RDF datasets on the Web

• http://linkeddata.org/

Page 12: Hack U Barcelona 2011

- 12 -

Dbpedia

• Dbpedia is dataset that contains much of the structured data in Wikipedia– Data from the info-boxes– Links between Wikipedia pages– Categories– Disambiguation and redirect pages

• Links to other datasets

Page 13: Hack U Barcelona 2011

- 13 -

Fetching individual resources

• Use your web browser• http://dbpedia.org/resource/Yahoo redirects to

http://dbpedia.org/page/Yahoo • You can plug in this URI into other Linked Data browsers

• HTTP GET to fetch data– Using curl: add Accept: application/rdf+xml for RDF and

enable redirect• curl -L -H 'Accept:application/rdf+xml'

'http://dbpedia.org/resource/Berlin’

• Data dumps– http://wiki.dbpedia.org/Datasets

Page 14: Hack U Barcelona 2011

- 14 -

Querying using SPARQL

• Interactive query builders• SPARQL Explorer: http://dbpedia.org/snorql/• Examples at: http://wiki.dbpedia.org/OnlineAccess

• Using HTTP GET– GET /sparql/?query=EncodedQuery HTTP/1.1 – Example:

• SELECT ?film ?x WHERE {

?film <http://dbpedia.org/ontology/language> <http://dbpedia.org/resource/French_language> . ?film <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://dbpedia.org/ontology/Film>}

• curl 'http://dbpedia.org/sparql?query=encodedQuery’

Page 15: Hack U Barcelona 2011

- 15 -

More data

• New York Times– http://data.nytimes.com/– Example URI:

• http://data.nytimes.com/60694995023816375851

– Also supports JSON• Append .json or set Accept:text/javascript

• Freebase– http://freebase.com– Example URI

• http://rdf.freebase.com/rdf/en.tron_legacy

– Data dump• http://download.freebase.com

Page 16: Hack U Barcelona 2011

- 16 -

And more data…

• Geonames: open geo data– Geonames.org– http://sws.geonames.org/5130561/– Download:

• http://www.geonames.org/export/

• Open Government data efforts– Data.gov

• See apps e.g. http://flyontime.us

– Data.gov.uk• http://data.gov.uk/sparql

Page 17: Hack U Barcelona 2011

- 17 -

Spanish open gov’t data and linked data efforts

• Spanish open data efforts– La Asociación Española de Linked Data (AELID)

• http://aelid.es/

– Proyecto Aporta• aporta.es

– Regional/local efforts• risp.asturias.es (RDF, SPARQL)• datos.zaragoza.es (RDF, SPARQL)• opendata.euskadi.net (RDF)• dadesobertes.gencat.cat (RDF)

– Competition AbreDatos 2010• abredatos.es

Page 18: Hack U Barcelona 2011

- 18 -

More info

• Segaran et al.: Programming the Semantic Web, O’Reilly, 2010.

• linkeddata.org• W3C Semantic Web Activity

– Presentations, guides etc.

• RDF Primer– http://www.w3.org/TR/2004/REC-rdf-primer-20040210/

• SPARQL query language and protocol specs– http://www.w3.org/TR/rdf-sparql-protocol/– http://www.w3.org/TR/rdf-sparql-query/

• Search SlideShare etc. for more intro material

Page 19: Hack U Barcelona 2011

Build your Own Search Service

(BOSS)Peter Mika

Yahoo! Research Barcelona

[email protected]

Page 20: Hack U Barcelona 2011

- 20 -

Innovate with Search!

• It’s really simple…

• Example: – pay $0.0008 for a query, earn $0.01 per query– 100,000 users a day, each making 1 query a day– Earn $920 dollars a day!

Page 21: Hack U Barcelona 2011

- 21 -

Reminds me of the underpants gnomes from the Simpsons

• http://en.wikipedia.org/wiki/Underpants_Gnomes

Page 22: Hack U Barcelona 2011

- 22 -

Yahoo BOSS: Yahoo’s Search API

• Ability to re-order results and blend-in addition content• No restrictions on presentation• No branding or attribution• Access to multiple verticals (web search, image, news)• Spelling suggestions• 40+ supported language and region pairs• Pricing (BOSS)

– 10,000 free queries a day– Pay for more queries– Serve any ads you want

• For more info, http://developer.yahoo.com/search/boss/• New in BOSS v2

– Powered by Bing– Retrieve ads from Yahoo! and earn money ;)

Page 24: Hack U Barcelona 2011

- 24 -

Queries you can play with

• Yahoo!’s WebScope program – Data sharing with universities and research institutions – Some of the most exciting data that we have!– Request access online

• http://webscope.sandbox.yahoo.com/

– Requires approval by Department Chair

• For HackU, you can sign up here for access to a dataset containing real world user queries– Yahoo! Search Tiny Sample v1.0: a set of 4,500 queries– Ideal for testing and demonstrating your search-based apps– Can you really show something interesting for all these users?