Upload
tony-hammond
View
9.713
Download
6
Embed Size (px)
DESCRIPTION
Lotico London Semweb Meetup - March 2013
Citation preview
Techniquesused in
RDF Data Publishingat
Nature Publishing Group
Tony HammondData Architect, NPG
March 5, 2013
22
Nature Publishing Group
● NPG a division of Macmillan (a privately owned company)
● Publishes ~120 titles in all● 34 Nature branded titles● 53 academic and society journals● 16 magazines (incl. Scientific American)
● ~1000 employees,17 offices (5 continents)● ~30 society partners● Databases, conferences/events, multimedia
33
Semantic Publishing at NPG
• Prior Work• RSS 1.0 webfeeds• HTML metadata• PDF metadata (XMP)• Urchin – RSS aggregator• OAI-PMH, OpenSearch (SRU), OpenURL
• Linked Data Apps• Public Data: test viability of data publishing• Hub: application of technology internally
44
Public Data
55
NPG by Numbers
66
NPG Ontology
77
Cloud Hosting
• TSO OpenUp® SaaS platform• Offers 5store as a triplestore• Scale-out architecture (C/C++)• Supports up to a trillion triples• 150,000tps load speed• SPARQL 1.0, with 1.1 features
(aggregates, etc)
88
data.nature.com
99
data.nature.com/query
1010
Hub
1111
Hub: Problem
1212
Hub: Solution
1313
Hub: Method
1414
XMP
1515
Building the Graph
1616
Local Hosting
• Apache TDB• Single-node architecture (Java)• Supports up to ~1.5b triples (tested)• SPARQL 1.1
1717
Data Publishing
1818
Hub Finder
1919
Hub Finder: Results
2020
Techniques
2121
Naming Architecture
2222
Naming Policy
Object Example Usage
Graph npgg:gadgets gadgets:33 ex:title "Title" npgg:gadgets .
Class npg:Gadget gadgets:33 a npg:Gadget npgg:gadgets .
Object Property
npg:hasGadget _:12 npg:hasGadget gadgets:33 npgg:_ .
Data Property
ex:title gadgets:33 ex:title "Title" npgg:gadgets .
Instance gadgets:33 gadgets:33 ex:title "Title" npgg:gadgets .
npg: http://ns.nature.com/terms/npgg: http://ns.nature.com/graphs/
2323
Publishing
2424
Monitoring
2525
ETL Process
2626
Datastore: Imports
2727
Datastore: Exports
2828
Contracts
npgg:affiliations a npg:Graph, void:Dataset ; dcterms:description "Graph of npg:Affiliation objects" ; dcterms:issued "2013-02-15"^^xsd:date ; dcterms:modified "2013-02-15"^^xsd:date ; dcterms:publisher [ a foaf:Organization ; foaf:mbox <mailto:[email protected]> ; foaf:name "Nature Publishing Group" ] ; dcterms:source "extractor-xml" ; dcterms:title "npgg:affiliations" ; rdfs:label "npgg:affiliations" ; void:classPartition [ void:class npg:Affiliation ; void:entities "973208"^^xsd:int ] ; void:propertyPartition [ void:property vcard:url ; void:triples "326"^^xsd:int ], [ void:property vcard:street-address ; void:triples "82638"^^xsd:int ], [
void:property vcard:region ; void:triples "183483"^^xsd:int ], [ void:property vcard:organisation-name ; void:triples "694290"^^xsd:int ], [ void:property vcard:locality ; void:triples "412042"^^xsd:int ], [ void:property vcard:email ; void:triples "21650"^^xsd:int ], [ void:property vcard:country-name ; void:triples 0 ], [ void:property rdfs:label ; void:triples "973208"^^xsd:int ], [ void:property rdf:type ; void:triples "973208"^^xsd:int ] ; void:triples "3340845"^^xsd:int ; void:vocabulary npg:, rdf:, rdfs:, void: .
2929
Linked Data API
• ./api/articles [.json, .rdf, .xml]• ./api/articles?hasProduct.pcode=ng• ./api/contributors?familyName=Smith• ./api/products.json?pcode=ng&_page=2• ./api/products?_view=none&_properties=pcode• ./api/search?title=black+hole• ./api/tree/subjects/children.xml?_sort=title
3030
Closing
3131
Positions Available
goo.gl/bYIt8www.linkedin.com/jobs?jobId=4890057&viewJob
3232
Information
data.nature.comdevelopers.nature.com/docs
datahub.io/group/npg
prefix.cc/npg