Linked Open Data: a new resource for eResearch Dr Anne Cregan
eResearch Analyst, Intersect and ANDS
[email protected]
Slide 2
What this talk will cover Open data The web of data RDF triples
RDF graphs The Linked Open Data project Publishing to the web of
data Consuming the web of data
Slide 3
Open data The philosophy and practice of making data freely
available to everyone, without restrictions from copyright, patents
or other mechanisms of control.
Slide 4
Why make data open? Public money was used to fund the work, so
it should be available to the public. Facts cannot legally be
copyrighted. Sponsors of research do not get full value for money
unless the resulting data are made freely available In scientific
research, the rate of discovery is accelerated by better access to
data. Source: How to Make the Dream Come True: The Astronomers Data
Manifesto (Norris, 2007)
Slide 5
How to make open data useful Principles Make it easy to find
Make it available to everyone Separate it from the applications
that use it Interlink it with related datasets in a meaningful way
Make it machine processable
Slide 6
The web of data The web of data = a naming model + a data model
on the web Its a web of interlinked data that machines can read
(whereas the web is a web of interlinked documents for people to
read) Also known as the Semantic Web because of its formal
semantics for reasoning and its relationship to meaning
Slide 7
The web of data It is an initiative of the World Wide Web
Consortium (W3C), and is a collaborative effort of many parties It
derives from W3C director Sir Tim Berners- Lee's vision of the Web
as a universal medium for data, information, and knowledge
exchange. Like the web, anyone can publish to it: anyone can say
anything about anything.
Slide 8
The web of data It is an initiative of the World Wide Web
Consortium (W3C) and is a collaborative effort of many parties It
derives from W3C director Sir Tim Berners- Lee's vision of the Web
as a universal medium for data, information, and knowledge
exchange. Like the web, anyone can publish to it: anyone can say
anything about anything. However, they need to say it in RDF, not
HTML.
Slide 9
The web of data It is an initiative of the World Wide Web
Consortium (W3C) and is a collaborative effort of many parties It
derives from W3C director Sir Tim Berners- Lee's vision of the Web
as a universal medium for data, information, and knowledge
exchange. Like the web, anyone can publish to it: anyone can say
anything about anything. However, they need to say it in RDF, not
HTML. And anything they want to talk about has to be a URI.
Slide 10
URI = Uniform Resource Identifier The naming model for the web
of data A URI is a unique name that identifies a resource A
resource is anything to which we can attach identity A resource can
be an information object, like a document or a webpage, but it can
also be a real world object, like a person. It can be anything at
all. For example: A URL is a kind of URI that names the resource
and also indicates a means of acting upon or obtaining it via its
primary access mechanism e.g. http, ftp URL: http://www.w3.o
rg/People/Berne rs-Lee/ URL: http://www.w3.org/
TR/rdf-concepts/
Slide 11
RDF = Resource Description Framework A framework for describing
and linking resources on the web Allows URIs to be connected into a
directed graph Based on the idea of triples Subject Predicate
Object
Slide 12
RDF = Resource Description Framework A framework for describing
and linking resources on the web Allows URIs to be connected into a
directed graph Based on the idea of triples: e.g.
intersect.org.au/inter sect- team/AnneCregan intersect.org.au
doac:organization
Slide 13
RDF = Resource Description Framework intersect.org.au
doac:organization ands.org.au doac:organization Putting triples
together creates a graph intersect.org.au/inter sect-
team/AnneCregan
Slide 14
RDF = Resource Description Framework intersect.org.au
doac:organization ands.org.au doac:organization Putting triples
together creates a graph Nodes of the graph are URIs and literals
intersect.org.au/inter sect- team/AnneCregan Anne
foaf:firstName
Slide 15
RDF = Resource Description Framework intersect.org.au
doac:organization ands.org.au doac:organization Has a schema to
describe relationships between things, called RDF Schema
intersect.org.au/inter sect- team/AnneCregan Anne
foaf:firstName
Slide 16
RDF = Resource Description Framework intersect.org.au
doac:organization ands.org.au doac:organization Is a World Wide Web
consortium (W3C) Recommendation Is part of the Semantic Web stack
intersect.org.au/inter sect- team/AnneCregan Anne
foaf:firstName
Slide 17
Semantic Web Technology Stack The Semantic Web standards build
on each other URI is the naming mechanism RDF, RDF-Schema and OWL
are the languages for describing resources and relationships
between them SPARQL is a query language for querying RDF
graphs
Slide 18
RDF Graphs Putting triples together creates a directed
graph
Slide 19
RDF Graphs Putting triples together creates a directed
graph
Slide 20
RDF Graphs Graphs can be interconnected by referring to URIs in
other graphs
Slide 21
RDF Graphs
Slide 22
Linking Open Data Project Community project of the W3C Semantic
Web and Outreach (SWEO) group Started in 2007 Has grown rapidly by
members of the community adding open datasets Has created the
largest existing RDF graph over 18 billion triples!
Slide 23
Linking Open Data Project October 2007
Slide 24
Linking Open Data Project September 2008
Slide 25
Linking Open Data Project July 2009
Slide 26
Linking Open Data Project July 2009
Slide 27
Linking Open Data Project April 2010
Slide 28
Linking Open Data Project As at May 2009 had created a linked
open data cloud of 4.7 billion RDF triples; in April 2010 Linked
Open Numbers added another 14 billion triples Datasets include:
DBpedia linked data version of wikipedia US Census 2000 US Census
data set Gene Ontology annotations from Gene Ontology db Drug bank
info about FDA approved drugs UniProt life sciences data set Lots
of bio/life sciences data sets - BIO2RDF cloud More info at
http://esw.w3.org/topic/TaskForces/CommunityProje
cts/LinkingOpenData/DataSets
http://esw.w3.org/topic/TaskForces/CommunityProje
cts/LinkingOpenData/DataSets
Slide 29
Publishing to the Linked Open Data Cloud Principles 1.Use URIs
to name things 2.Use HTTP URIs so you can look up those things on
the web 3.When someone looks up a URI, provide useful information
(dereference-able) 4.Include RDF statements that link to other URIs
so that they can discover related things These principles are from
Tim Berners-Lees 2007 note:
http://www.w3.org/DesignIssues/LinkedData.html
Slide 30
Consuming linked open data Browsing linked data is easy You
need an RDF Browser like Tabulator, Disco, Zitgist, Marbles and
OpenLink Lets go for a ride on Disco: http://www4.wiwiss.fu-
berlin.de/rdf_browser/ Start here:
http://www.w3.org/People/Berners-Lee/card#i http://www4.wiwiss.fu-
berlin.de/rdf_browser/ We can travel through the linked open data
cloud between URIs linked using RDF RDF Browsers include Marbles
http://www5.wiwiss.fu-berlin.de/marbles
Slide 31
Consuming linked open data eResearch example: Enabling drug
discovery Data sets published to the data cloud: Linked CTLinked
Clinical Trials 60,000 trials in 158 countries DrugBankFDA-approved
drugs 5,000 small molecule and biotech drugs DiseasomeDisorders and
Disease genes 4,300 Disorders, disease genes and associations
DailyMedChemical structures of marketed drugs 124,000 triples and
29,600 links SWAN Alzheimers Hypothesis Browser Knowledgebase
Slide 32
Consuming linked open data Using an RDF browser: See all drugs
in trials for Alzheimers disease in Linked CT, including a Phase
III trial for Varenicline Follow a link to data from DailyMed
showing that Varenicline is already on the market for nicotine
addition. The typical dose is 1mg twice daily and the Linked CT
trial used no higher than that so no new safety issues. Link to
DrugBank to find that Varenicline is an alpha-4 beta-2 neuronal
nicotine acetylcholine receptor agonist. Diseasome indicates that
the corresponding genes are only important in nicotine addiction,
not Alzheimers. But the SWAN Knowledgebase shows there are
hypotheses relating Alzheimers to nicotinic receptors through
amyloid beta.
Slide 33
Consuming linked open data Using the linked open data cloud
with an RDF browser, able to : Browse data relating to companies,
clinical trials, drugs, diseases and genetic variation See when
extra data is available Gain access to data without needing to map
identifiers and synonyms interlinking has already been done Gain
additional insights about interesting questions to ask Jentzsch et
al Enabling Tailored Therapeutics with Linked Data
events.linkeddata.org/ldow2009/papers/ldow2009_paper9. pdf
Slide 34
Consuming linked open data Querying using SPARQL Queries A
SPARQL endpoint enables users (human or other) to query a knowledge
base via the SPARQL language. Results are typically returned in one
or more machine-processable formats. Examples:
http://wiki.dbpedia.org/OnlineAccess
http://wiki.dbpedia.org/OnlineAccess
Slide 35
Types of Queries Selection and extraction queries retrieve
parts of the data based on its content, structure, or position
Reduction queries specify which part of the data not to include in
the answer Restructuring queries restructure data into possible
formats/serialisations Aggregation queries aggregate several data
item into one new data item Combination and inference queries
combine information that is not explicitly connected
Slide 36
Summary Open data The web of data RDF triples RDF graphs The
Linked Open Data project Publishing to the web of data Consuming
the web of data
Slide 37
Thankyou More details are at http://linkeddata.org/
http://linkeddata.org/
http://esw.w3.org/topic/SweoIG/TaskForces/Communit
yProjects/LinkingOpenDatahttp://esw.w3.org/topic/SweoIG/TaskForces/Communit
yProjects/LinkingOpenData
http://www.w3.org/2001/sw/http://www.w3.org/2001/sw/ Questions and
comments may be emailed to
[email protected]@intersect.org.au