Upload
lee-chase
View
215
Download
0
Embed Size (px)
Citation preview
2
Best Practices
• Publishing vocabularies• Data model customization• Real-world things• JSON and RDF• Multi-valued and optional properties• Provenance and inverse properties• Ontologies and constraints
4
Publishing vocabularies
• We should use established vocabularies if they exist– W3C, Dublin Core, OSLC, …
• Any new terms we define should be described in vocabulary documents rooted at http://jazz.net/ns– propose generally useful terms to OSLC
• When you look up an RDF term, you should get its vocabulary document– HTML for web browsers– RDF for programs, e.g. query builders– e.g. http://jazz.net/ns/qm/rqm#Category
6
How to publish a vocabulary
• We have a new public wiki!– https://jazz.net/wiki/bin/view/LinkedData
• Read the guidelines• Create a wiki page and attach the HTML,
Turtle, and RDF/XML files• Request a review from Nelson– Allow dev time to address issues
• Arthur will redirect jazz.net/ns to the wiki
8
Abuses
• You published your vocabulary but skimped on the content– e.g. minimal or cryptic comments
• You published your vocabulary, but didn’t keep it up-to-date– e.g. Focal Point 227292
• You created some new terms but didn’t publish your vocabulary– e.g. JLIP Tracked Resource Set 306919
10
Data model customization
• Many of our tools allow customization– e.g. RTC work items
• We need to expose the custom data elements as RDF• Tools should allow users to map custom data elements to
externally defined RDF terms– industry standards– corporate standards
• When no mapping is specified, tools should generate local RDF terms and vocabularies– vocabularies are needed by query authors– tools must host the vocabularies they generate
11
Abuses
• Your tool generates a cryptic URI for local RDF terms– Obfuscates meaning– Forces humans to access vocabulary document
• Your tool does not generate a vocabulary document for local RDF terms– e.g. RTC 304143– see following case study
• When the mapping to RDF is changed, your tool does not create TRS change events for just the affected resources
12
Case study: RTC Work Items
• Some attributes are built-in• Some are defined by OSLC CM 2.0• Some are user defined• Consider Priority
18
RDF triple for priority
• Subject (good) <https://jazzop05.rtp.raleigh.ibm.com:9943/jazz/resource/itemName/com.ibm.team.workitem.WorkItem/224727>
• Predicate (bad) <http://open-services.net/ns/cm-x#priority>
• Object (ugly) <https://jazzop05.rtp.raleigh.ibm.com:9943/jazz/oslc/enumerations/_QYx2UBIzEd6bpunPP4ZLOA/priority/priority.literal.l3>
20
Problems
• The priority predicate comes from a non-existent vocabulary (bad)– http://open-services.net/ns/cm-x#– RDF vocabularies should be dereferenceable– OSLC should publish it, tagged as archaic
• The object is a dereferenceable URI (good), but not a vocabulary term (ugly)– Need rdfs:label, rdfs:comment for query authors
• Result: no easy way to write queries based on priority
21
Best Practice for external vocabularies
• RTC project template should refer to external vocabularies for standard terms– OSLC CM V3 defines priority and 4 values
• Teach and enable clients to create corporate standard vocabularies for reuse of common terms (UA)– Needed for cross-project queries
• Provide export/import UI to manage vocabularies– E.g. Focal Point uses simple spreadsheet format
22
Best Practice for local vocabularies
• RTC (and all other tools) should generate a local RDF vocabulary for all user-defined terms– Include rdfs:label, rdfs:comment for query authors
(and other consumers)• LQE admin should load user-defined
vocabularies into LQE to make them available to queries– provide programmatic integration, e.g. a special
purpose vocabulary TRS
23
Best Practice for all vocabularies
• When an administrator changes the RDF representation of a set of resources, corresponding change events MUST be added to the TRS change log– Add/remove custom attributes and values– Modify mapping to RDF URIs
• Allow the administrator to make multiple representation changes and then manually trigger the generation of change events– Batch multiple representation changes together to
minimize re-indexing time and server load
25
La Trahison des Images
"The famous pipe. How people reproached me for it! And yet, could you stuff my pipe? No, it's just a representation, is it not? So if I had written on my picture "This is a pipe", I'd have been lying!“- René Magritte
26
Real-world things
• Linked Data differentiates between two kinds of thing– Information, e.g. a document on the web– Real-world, e.g. a person
• Both kinds should be identified with HTTP URIs• Looking up a real-world URI should result in an
information resource that contains information about the real-world thing– URI-references (hash URIs)– HTTP redirect: 303 See Other (303 URIs)
• Refer to Cool URIs for the Semantic Web
27
Example foaf:Person
• Suppose you create a document, http://people.org/johnsmith, about John Smith on 2013-09-17
• The following is nonsense because John Smith was not created on 2013-09-17:<http://people.org/johnsmith> a foaf:Person .<http://people.org/johnsmith> dcterms:created “2013-09-17”^^xsd:date .
• The following makes sense:<http://people.org/johnsmith#me> a foaf:Person .<http://people.org/johnsmith> dcterms:created “2013-09-17”^^xsd:date .
28
Abuses
• Failure to differentiate between a person and an account owned by a person– Leads to nonsense triples– Focal Point Defect 234212 – JTS Defect 307861– See following JTS users case study
• NOTE: email address is the preferred way to identify people across tools
30
JTS Users
• OSLC Core specifies that the object of dcterms:creator, dcterms:contributor, oslc:modifiedBy should be a resource of class foaf:Agent or foaf:Person (real-world)
• RTC implements OSLC CM and has triples like:<https://jazz.net/jazz02/resource/...WorkItem/72226>
dcterms: creator <https://jazz.net/jts04/users/ryman> ,dcterms:contributor <https://jazz.net/jts04/users/retchles> .
32
Best Practice
• The property j.1:archived applies to the user account (information resource), not the person (real-world)
• Solution 1: use hash URIs for people:<https://jazz.net/jts04/users/ryman#me>
• Solution 2: use 303 URIs for accounts (preferred by Philippe):<https://jazz.net/jts04/accounts/ryman>
33
303 URI Solution
@prefix foaf: <http://xmlns.com/foaf/0.1/>.@prefix jfs: <http://jazz.net/xmlns/prod/jazz/jfs/1.0/>.
<https://jazz.net/jts04/accounts/ryman> a foaf:OnlineAccount , jfs:archived false.
<https://jazz.net/jts04/users/ryman> a foaf:Person; foaf:account < https://jazz.net/jts04/accounts/ryman> , foaf:img <https://jazz.net/jts04/users/photo/ryman>; foaf:mbox <mailto:[email protected]>; foaf:name "Arthur Ryman"; foaf:nick "ryman".
35
JSON
• Familiar to OO and Web developers• Popularity fueled by Cloud• e.g. Amazon uses JSON as the payload in AWS
REST APIs as an alternative to SOAP and XML– Simpler/faster to handle by web clients
• Use is spreading across the stack– MongoDB, CouchDB/Cloudant– node.js
36
JSON and RDF
• Some developers are saying: “JSON is simpler and more popular than RDF. Let’s use JSON instead of RDF.”– This is a false dichotomy
• JSON is just as problematic as XML for data integration– JSON and XML are message formats
• Linked Data is our integration strategy– RDF expresses semantics
• Use JSON-LD, now a W3C standard– OSLC and Rational should publish standard contexts
• See following LQE Security Context case study
37
Initial JSON design
• Simple, but no explicit semantics• Use of UUIDs instead of HTTP URIs
[ { "security_context_id" : "urn:uuid:f81d4fae-7dec-11d0-a765-00a0c91e6bf6", "name" : "Resources for Alpha project" }, { "security_context_id" : "urn:uuid:g92e5gbf-8efd-22e1-b876-11b1d02f7cg7", "name" : "Resources for Beta project" } ]
38
Equivalent JSON-LD design
{ "@context": { "@base": "https://example.com/sc", "dcterms": "http://purl.org/dc/terms/" }, "@graph": [ { "@id": "#1", "dcterms:title": "Resources for Alpha project" }, { "@id": "#2", "dcterms:title": "Resources for Beta project" } ] }
39
Final JSON-LD design with type info{ "@graph": [ { "@id": "https://example.com/sc", "@type": "http://open-services.net/ns/core/sc#SecurityContextList" }, { "@id": "https://example.com/sc#1", "@type": "http://open-services.net/ns/core/sc#SecurityContext", "http://purl.org/dc/terms/title": "Resources for Alpha project" }, { "@id": "https://example.com/sc#2", "@type": "http://open-services.net/ns/core/sc#SecurityContext", "http://purl.org/dc/terms/title": "Resources for Beta project" } ] }
41
Multi-valued and optional properties
• RDF documentations contain sets of triples• Model multi-valued properties by a set of
triples that share a common subject and object
• Model the absence of an optional property by an empty set of triples
42
Abuses
• Model multiple values of a property by concatenating the values into a single object– Defeats database indexing– Slows queries since substring matching must be used
• Model the absence of an optional value using the presence of an empty string– Adds many unnecessary triples– Slows queries (longer scans)– Sometimes an empty string is a meaning value– Sometimes an empty string is lexically invalid
• See following RTC tag case study Defect 271867
44
RDF representation
@prefix dcterms: <http://purl.org/dc/terms/> .@prefix rtc_cm: <http://jazz.net/xmlns/prod/jazz/rtc/cm/1.0/> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@base <https://jazz.net/jazz/resource/itemName/com.ibm.team.workitem.WorkItem/> .
<271867> dcterms:subject "datagap, oslc, next_release_candidate, data_gap, reporting-gap"^^xsd:string ;… rtc_cm:estimate ""^^xsd:long .
Syntax validated OK. There were warnings: Typed literal has an invalid lexical value: Input string was not in the correct format: s.Length==0.: ""^^<http://www.w3.org/2001/XMLSchema#long>.
46
Provenance: Where did the triple come from?
• A statement is represented by a triple• Triples from multiple documents may be merged and queried
– Default graph is a triple store• When storing RDF documents, the document URL is often
used as the name of a graph (e.g. in LQE)– triple + graph name = quad– triple stores are really quad stores
• Provenance of triples is important in several use cases– Updating a document– Access control– VVC (which version)
47
Provenance and authority
• The authority (trust) of a triple depends on the author of the document that contains the triple
• Triples should be placed in the document that the author is authorized to modify– When creating a link from A to B, put the link in
the document that the author is editing, not necessarily A or B or both
– Document C may contain links from A to B
48
Inverse properties• Directed relations between resources (links) may be stated in two
equivalent ways, e.g.– Testcase1 validates Requirement2 .– Requirement2 isValidatedBy Testcase1 .
• There is no benefit to having mutual inverse pairs of properties• The existence of mutual inverse pairs of properties makes query
authoring more complex, and query execution more expensive• A triple should be put in the document that the author of the triple is
editing (provenance)– There is no special significance attached to being the subject of a triple
• See OSLC guidance on preferred direction of properties– Direction should be from downstream to upstream, – e.g. test case validates requirement
49
Abuses
• OSLC domain specs define many pairs of mutual inverse predicates
• Recommendation– Deprecate one member of each pair– Replace deprecated property in all RDF
representations and queries
51
Vocabularies and Ontologies
• A vocabulary defines the meaning of terms– Use RDFS: rdfs:label, rdfs:comment,
rdfs:isDefinedBy, …• An ontology defines inference rules– Given a set of triples, infer more triples– Use RDFS: rdfs:domain, rdfs:range,
rdfs:subClassOf, …– Use OWL for more complex inference rules
52
Ontologies and Constraints
• Ontologies are not designed to define integrity constraints– See Linked Data Interfaces for examples
• An RDFS or OWL reasoner will add triples to create a model for the ontology
• A reasoner will report an inconsistency if it cannot create a model– However, this mechanism cannot in practice be
used to check for typical integrity constraints
53
Best Practice: Ontologies
• Your triples may end up in a reasoner one day, so only add inference rules when they produce the intended results
• If you define generic properties, such as “uses”, then you probably SHOULD NOT define rdfs:domain and rdfs:range
• If you define type-specific properties, such as “usesTestCase” then rdfs:domain and rdfs:range MAY make sense
• e.g. If you intend to infer that the object of oslc_qm:usesTestCase is an oslc_qm:TestCase then include the following triple in an ontology:
oslc_qm:usesTestCase rdfs:range oslc_qm:TestCase .
54
Best Practice: Constraints
• W3C is starting an activity on RDF validation– See W3C workshop
• We have submitted the OSLC Resource Shape specification to W3C– See Resource Shape 2.0
• Use Resource Shape 2.0 to describe integrity constraints on RDF documents
55
Other topics
• Blank nodes– Mean there exists or some– use fragment ids for internal resources
• Containers– Avoid Seq, Bag, List– Use Linked Data Platform containers
• Consuming external vocabularies– Tools should gracefully degrade when external
resources are unreachable– Be a well-behaved HTTP client wrt caching, etc.