View
1.560
Download
4
Category
Tags:
Preview:
DESCRIPTION
Citation preview
Hacking with Semantic Web
Tom PraisonDeveloper @ Yahoo!
http://twitter.com/tompraison
What’s in here?
• Evolution of the web• Poorly Solved Information Needs• Semantic Web Technologies• Linked Data• Demo of confhopper.in, a site built using open
datasets• Some techniques for getting Structured Information
from Web.• Demo of Yahoo! Contextual Analysis Platform and
Open Dapper
Tim Berners Lee – Inventor of the WWW
I just had to take the hypertext idea and connect it to the Transmission Control Protocol and domain name system ideas and—ta-da!—the World Wide Web.
Few Content Creators! Majority Consumers!
WEB 1.0
http://www.flickr.com/photos/leandrociuffo/3665883373/
WEB 2.0
Web as a platformhttp://www.flickr.com/photos/lambertwm/4737580179/
Ofoto Flickr
Personal Website Blogging
Britannica Online Wikipedia
Directories(taxonomy) Tagging(“folksonomy”)
Content Management Systems
Wikis
WEB 1.0 vs WEB 2.0
WEB 3.0
http://www.flickr.com/photos/markhillary/337685031
Which direction will it take?
Semantic Web
Pervasive Web
Artificial IntelligencePersonalization
Virtual Web WEB 3.0
Could be anything!
A Web of Documents rather than Data!
Today’s Web
Poorly Solved Information Needs
• Multiple interpretations– Apple
• Long tail queries– Roja (I meant a south indian actress)
• Imprecise or overly precise searches – jim hendler– pictures of strong adventures people
• Searches for descriptions– countries in africa– 25 year old computer engineer living in Bangalore– Reliable smart phone under 15,000 rupees
THE SOLUTION
Semantic Web
Publish data on the Web
• Linked Data: linking data similar to how we link documents on the Web
• Query databases over the Web
Architectural Challenges
• A common format for sharing data• Sharing the meaning of data• Infrastructure
Semantic Web standards from W3C
• Data and schema languages (RDF, OWL, RIF)
• Document formats (RDF/XML, RDFa)
• Protocols (SPARQL, HTTP)
Current Researches & Other Efforts
• Semantic Web research into knowledge representation and reasoning, data integration, data quality and many other topics
• Community effort (Linked Data movement)
RDF (Resource Description Framework)
• The basic data model of the Semantic Web– A universal model to capture all sorts of data: networks,
relational, object-oriented…• Basic unit of information is a triple
– A tuple of (subject, predicate, object)– Example: (Joe, loves, Mary)– Each triple gives the value of a property for a given resource or
relates two objects to one another• Object is either a resource or a literal
• An RDF model is a set of triples– Ordering of statements in an RDF document is irrelevant (unlike
XML)
Graphical and textual notation
A number of ways to serialize an RDF model into an RDF document
RDF/XML, Turtle, N3, N-Triples
my:Joe
“Joe A.”
name
typefoaf:Person
RDF is designed for the Web• URIs provide web-wide global identification across datasets
– A resource may be described by multiple documents
– URIs are intended to be reused– Unique, but not single identifiers: two URIs may
denote the same thing
RDF is designed for the Web• URIs can be retrieved from the Web
– A well-behaved URI returns a description of the resource
– Provides authority: the definition of foaf:Person lives at that URI
• Ontologies can be looked up as well– Typically at the root of the URIs, also known as the
namespace– Example: http://xmlns.com/foaf/0.1/Person
redirects to the specification
URIs implicitly link data together
(#joe, #name, “Joe A.”)(#joe, #email, mailto:joe@joe.com)
(#mary, name, “Mary B.”)(#mary, gender, “female”)
(#joe, #loves, #mary)
Joe’s homepage
A social networking site
Mary’s homepage
(#name, #type, #Property)(#name, #domain, #Person)
Schema doc
Put together, triples form a single ‘global’ graph
“Joe A.”
#joe
#name
“joe@joe.com”
#mary
#loves
“Mary B.”
“female”
#name
#gender
RDF Example
Linked Data cloud: interlinked RDF datasets on the Web
http://linkeddata.org/
DBPedia
• Dbpedia is dataset that contains much of the structured data in Wikipedia– Data from the info-boxes– Links between Wikipedia pages– Categories– Disambiguation and redirect pages
• Links to other datasets
Fetching individual resources
• Use your web browser• http://dbpedia.org/resource/Yahoo redirects to
http://dbpedia.org/page/Yahoo • You can plug in this URI into other Linked Data browsers
• HTTP GET to fetch data– Using curl: add Accept: application/rdf+xml for RDF
and enable redirect• curl -L -H 'Accept:application/rdf+xml'
'http://dbpedia.org/resource/Berlin’
• Data dumps– http://wiki.dbpedia.org/Datasets
Querying using SPARQL
• Interactive query builders• SPARQL Explorer: http://dbpedia.org/snorql/• Examples at: http://wiki.dbpedia.org/OnlineAccess
• Using HTTP GET– GET /sparql/?query=EncodedQuery HTTP/1.1 – Example:
• SELECT ?film ?x WHERE { ?film <http://dbpedia.org/ontology/language>
<http://dbpedia.org/resource/French_language> . ?film <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://dbpedia.org/ontology/Film>}
• curl 'http://dbpedia.org/sparql?query=encodedQuery’
ConfHopper.in
• Award winning app in WWW2012 Metadata Challenge.
• Confhopper.in is a desktop / mobile HTML5 based application designed for conference attendees.
• Built with the help of open datasets from http://data.semanticweb.org/ and various other sources.
Some Techniques for getting Structured Information from Web
• Semantic Markup• NER• Extraction Tools (Dapper)
Semantic Markup
• Microdata (Schema.org)• RDFa• Open Graph Protocol (ogp.me)• Example:
http://getschema.org/microdataextractor?url=http://www.tompraison.com&out=json
NER – Named Entity Recognition
• Yahoo! Content Analysis API• http://developer.yahoo.com/contentanalysis/
Dapper
http://open.dapper.net
Dapper is a tool that enables users to create update feeds for their favorite sites and website owners to optimize and distribute their content in new ways.
References
• http://www.slideshare.net/tompraison• http://inkdroid.org/journal/2010/06/04/the-5
-stars-of-open-linked-data/
• http://www.freebase.com/• http://dbpedia.org/About
Recommended