Semantic Web Activities @ W3C

Semantic Web Activity @ W3C

Semantic Web Activities @ W3CIvan Herman, W3CSemantic Technology & Business Conference6th June, 2012San Francisco, CA, USA

(#)(#)A system manipulating and analyzing knowledgee.g., via big ontologies, vocabulariesGoogles Knowledge Graph?Improve search by adding structure to embedded dataA means to integrate many different pieces of dataIntegrate data-oriented applicationsAnd a mixture of all these

For some people, Semantic (Web) is(#)For the second bullet: in some cases graph based databases simply work better3And that is all right!(#)We have to acknowledge that the field has grown and has become multi-facetedAll different views have their success storiesThere are also no clear and water-proof boundaries between the different views(#)So what is happening at W3C?(#)Some technologies are, essentially, done:Ontology for Media ResourcesMedia Fragments URI SPARQL 1.1 (SPARQL Protocol and RDF Query Language)RDB2RDF (Relational Databases to RDF)RDFa 1.1 (RDF in attributes)The (almost) past(#)Some areas are subject of workUpdate of RDFProvenanceLinked Data PlatformThe present(#)We are discussing new works, new areas, e.g.,Access Control issuesConstraint checking on Semantic Web dataThe future(#)Various communities have different emphasis on which part of the Semantic Web they want to useW3C has contacts with some of thosehealth care and life sciences (a separate IG is up and running)libraries, publishingfinancialsthe oil, gas, and chemicals communitygovernments but there are many more!

Link to specialized communities(#)The communities often contribute technologies that can be used in generalFor example:New vocabularies may come to the fore: SKOS or FRBR (from libraries), annotations (originally form the HCLS work), Person vocabularies (from the eGov work)Health Care had a major influence on the Provenance worketc.

These community links are not one-way streets!(#)

Audio, Video, and SemanticsPhoto credit Robert Freund(#)Audio and Video are now first class entities on the Web(#)But video and audio on the Web is not only what you see and hear it is also what you can search, discover, distribute, and manage!(#)The usual Semantic Web problem: what vocabularies to use?The problem is not that there arent any but that there are too many!EXIF, MPEG7, XMP, MRSS, none of these cover all aspectsThe Ontology for Media Resources documentdefines a core vocabularydefines a set of mappings to other formatsOntologies for Media Resources(#)Questions:what is the standard URI for, say, a temporal fragment of a video?what should be the behavior of the user agents for these URIs?These are covered by the Media Fragments URI document, e.g.http://www.example.com/video.ogv#t=10,20http://www.example.com/video.ogv#track=audiohttp://www.example.com/video.ogv#xywh=160,120,320,240Media Fragment URIs(#)Ontologies for Media Resources: published as a Recommendation in February 2012Media Fragments URI: should be published as a Recommendation very soonMedia Work Status(#)Query RDF: SPARQL 1.1Photo credit reedster, Flickr(#)Nested queries (i.e., SELECT within a WHERE clause)Negation (MINUS, and a NOT EXIST filter)Aggregate function on search results (SUM, MIN,)Property path expression (?x foaf:knows+ ?y)SPARQL UPDATE facilities (INSERT, DELETE, CREATE)Combination with entailment regimesReturn format definition in JSON and in CSVSPARQL 1.1: adding missing features to SPARQL(#)SPARQL 1.1 as a unifying pointSPARQL Processor

HTML

Unstructured Text

XML/XHTML

RelationalDatabaseSQLRDF

DatabaseSPARQL Endpoint

Triple storeSPARQL EndpointRDF GraphApplicationRDFa, MicrodataGRDDL, RDFaNLP TechniquesSPARQL ConstructSPARQL ConstructSPARQL UpdateSPARQL UpdateInferencingInferencing(#)20Technology has been finalizedGoes to Proposed Recommendation soonShould be published as a standard by this fallSPARQL 1.1 Status(#)Access to Relational DatabasesPhoto credit mayhem, Flickr(#)Relational database vendors realize the importance of the Semantic Web marketMany systems have a hybrid view: traditional, relational storage, usually coupled with SQLRDF storage, usually coupled with SPARQLexamples: Oracle 3g, IBMs DB2, OpenLink Virtuoso,Many RDB systems can handle RDF(#)Export does not necessarily mean physical conversionfor very large databases a duplication would not be an optionsystems may provide SPARQLSQL bridges to make queries on the flyResult of export is a logical view of the RDB contentWhat is export?(#)A canonical RDF view of RDB tablesOnly needs the information in the RDB SchemaSimple export: Direct Mapping(#)Table references are URI objectsFundamental approachISBNAuthorTitlePublisherYear0006511409Xid_xyzThe Glass Palaceid_qpr20000007179871id_xyzThe Hungry Tideid_qpr2004IDNameHomepageid_xyzGhosh, Amitavhttp://www.amitavghosh.comEach row is a subjectEach column name provides a predicateCells are Literal objects(#)

Direct MappingTablesRDB SchemaDirect Graph(#)Pros:Direct Mapping is simple, does not require any other conceptsknow the Schema know the RDF graph structureknow the RDF graph structure good idea of the Schema(!)Cons:the resulting graph is not what the application really wantsPros and cons of Direct Mapping(#)

Direct MappingTablesRDB SchemaGraph Processing(Rules, SPARQL, )Direct GraphFinal, Application Graph(#)Separate vocabulary to control the details of the mapping, e.g.:finer control over the choice of the subjectcreation of URI references from cellspredicates may be chosen from a vocabularydatatypes may be assignedetc.Gets to the final RDF graph with one processing stepBeyond Direct Mapping: R2RML(#)

R2RMLMappingTablesRDB SchemaFinal, Application GraphR2RML Instance(#)Fundamentals are similar:each row is turned into a series of triples with a common subjectDirect mapping is a default R2RML mappingWhich of the two approaches is used depend on local tools, personal experiences and background,e.g., user can begin with a default R2RML, and gradually refine itRelationships to the Direct Mapping(#)Technology has been finalizedImplementations revealed some minor issues to fold into the specificationShould be finished this summerR2RML and Direct Mapping Status(#)Implementations of R2RML, Wednesday, 8:45amSouripriya Das (Oracle), Jans Aasman (Franz Inc.), Juan Sequeda (Capsenta), and Tony Vachino (Spry, Inc.)At the conference

(#)Structured data in HTML: RDFa & microdataPhoto credit shetladd, Flickr(#)Not necessarily large amount of data per page, but lots of themHave become very valuable to search enginesGoogle, Bing, Yahoo!, or Yandex (i.e., schema.org) all committed to use such dataTwo syntaxes have emerged at W3C:microdata with HTML5RDFa with HTML5, XHTML, and with XML languages in generalHTML pages are a huge source of structured data(#)

(#)

(#)

(#)Yielding

schema:alumniOf ; foaf:schoolHomePage ; schema:worksFor ; dc:title "Etvs Lornd University of Budapest" . dc:title "World Wide Web Consortium (W3C)(#)40

(#)

(#)

(#)Yielding[ rdf:type schema:Review ; schema:name "Oscars 2012: The Artist, review" ; schema:description "The Artist, an utterly beguiling" ; schema:ratingValue "5" ; ]

(#)44

(#)Both have similar philosophies:the structured data is expressed via attributes only (no specialized elements)both define some special attributese.g., itemscope for microdata, resource for RDFaboth reuse some HTML core attributes (e.g., href)both reuse the textual content of the HTML source, if neededRDF data can be extracted from bothRDFa and microdata: similarities(#)Microdata has been optimized for simpler use cases:one vocabulary at a timetree shaped datano datatypesRDFa provides a full serialization of RDF in XML or HTMLthe price is an extra complexity compared to microdataRDFa 1.1 Lite is a simplified authoring profile of RDFa, very similar to microdataRDFa and microdata: differences(#)Structured data in HTML is mainstream! 25% of webpages containing RDFa data [] over 7% of web pages containing microdata.Mail from Peter Mika, Yahoo!Based on a crawl evaluation by P. Mika and T. Potter LDOW2012 Workshop, April 2012, Lyon, France web pages that contain structured data has increased from 6% in 2010 to 12% in 2012.Hannes Mhleisen and Christian BizerWeb Data CommonsExtracting Structured Data from Two Large Web Corpora,LDOW2012 Workshop, April 2012, Lyon, France (#)For RDFa 1.1Technology has been finalizedIs in Proposed RecommendationShould be published as a Recommendation any day nowFor microdataTechnology has been finalizedThere is microdataRDF mapping in a separate NoteIs part of HTML5, hence its formal advancement depends on other technologiesRDFa 1.1 and microdata status(#)Schema.org panel, Wednesday, 9:45amDan Brickley (schema.org), Ramanathan Guha (Google), Steve Macbeth (Microsoft), Peter Mika (Yahoo!), Jeff Preston (Disney Interactive), Evan Sandhaus (NYT), Alexander Shubin (Yandex); moderator: Ivan Herman (W3C)At the conference

(#)Cleaning up RDFNexus Simulation Credit Erich Bremmer(#)Many issues have come up since 2004:deployment issuesnew functionalities are neededunderlying technology may have moved on (e.g., datatypes)The goal of the RDF Working Group is to refresh RDFNOT a complete reshaping of the standard!RDF cleanup (a.k.a. RDF1.1)(#)Standardize Turtle as a serialization formatClean up some aspects of datatyping, e.g.:plain vs. typed literalsintroduction of an rdf:HTML datatypebetter definition of rdf:XMLLiteralProper definition for named graphsincluding concepts, semantics, syntax, obviously important for linked data accessbut generates quite some discussions on the detailsStandardize a JSON format for linked data (JSON-LD)Some new features/plans(#)Cleanup the documents, make them more readablemaybe a completely new primerprobably a new structure for the Semantics document

Editorial improvements(#)Turtle is almost finalizedLiteral cleanup is doneJSON-LD is in a good shapeLots of discussion currently on named graphsA new version of the RDF 1.1 concepts has just been published this morning!http://www.w3.org/TR/rdf11-concepts/Status(#)Updates to the Core RDF Standards, Wednesday, 3:30pmDavid Wood (3 Round Stones, Inc., also co-chair of the W3C RDF WG)JSON-LD: JSON for Linked Data, Thursday, 9:45amGregg Kellogg (Kellogg Associates)At the conference

(#)Provenance(#)We should be able to express all sorts of meta information on the datawho played what role in creating the data (author, reviewer, etc.)view of the full revision chain of the datain case of data integration: which part comes from which original data and under what processwhat vocabularies/ontologies/rules were used to generate some portions of the dataetc.The goal is simple(#)Requires a complete model describing the various constituents (actors, revisions, etc.)The model should be usable with RDFHas to find a balance betweensimple provenance: easily usable and editablecomplex provenance: allows for a detailed reporting of origins, versions, etc.That is the role of the Provenance Working Group (started in 2011)the solution is more complicated(#)ex:facetedViewex:integratedDataex:photosDBex:metadatasimile:exhibitex:integratemailto:[email protected] foaf:namefoaf:mboxmailto:[email protected]:namefoaf:mboxusedwasGeneratedByusedusedwasAttributedTowasAssociatedWithwasAssociatedWithwasGeneratedBy

Screendump from Zepheira(#)StatusDrafts have been publishedabstract data model, OWL versionprotocol (where to find provenance data)primerGoal is to finalize the technical design in the fall of 2012(#)Benefits and Applications of W3Cs Provenance Standards in Enterprise Semantic Web Applications, Thursday, 10:45amReza Bfar (Oracle)At the conference

(#)Linked Data Platforms(#)

Courtesy of Richard Cyganiak and Anja Jentzsch(#)295 datasets, cca. 31.6B triples64

(#)Integrated datasets: dbpedia, open street maps, INSEE geographical information, official french list of historical monuments; not all made available in RDF, but integrated in RDF with the available conversion tools; results are merged in an RDF database (over 4.5 million triples) and a company tools is created for the user interface. Done in a few days using available tools65The datasets are essentially read-onlythey are curated out of band: regularly extracted from other databases, changed manually by data owners, etcThe dominating paradigm is to extract data via SPARQL queriesApplications use (very) large datasets via (RDF based) integration

Some characteristics of Linked Data and its Applications(#)However Linked Data Has Potentials for Simpler Applications(#)Application Lifecycle Managementintegration of development teams around the globemanagement of bug report, user requirementsversioningDistributed access to, and management of Library Catalogue dataIntegrated view of corporate and private address book dataExample: Data-intensive application integration (#)There has been approaches in the pastPoint-to-point via APICentralized repositoryCentral Hub/BusInspired by Arnaud Le Hors, IBM, LDOW2012 Presentation(#)They force to re-invent the wheel on many frontsdistribution of data around the Internetaccess control issuesdefinition and implementation of new API-s, Protocols, data formatsetc.None of these are really satisfactory (#)Distributed at its coreScalable in terms of users, of data, of hardware and software architectureOpen to anyoneHas a wide variety of available toolsWidely known and deployedWhy not make use of an architecture that isSounds Familiar? (#)Application #1Application #2Linked Data on Server #1Linked Data on Server #2Linked Data Provides a Better Paradigm: Use the Existing Web Architecture!HTTP GET, HEADHTTP PUT, DELETEURI+HTTP(#)Provide a simple, HTTP based infrastructure to publish, read, write, or modify linked dataThe infrastructure should be easy to implement and installmore complex applications may require more sophisticated tools like SPARQL, Provenance, OWL,provides an entry point for Linked Data applications!Linked Data Platform WG(#)Main Work items:define a RESTful way to access/update RDF data via HTTPwhat does HTTP GET/PUT/DELETE/POST/ mean for Linked Data?define a profile of minimal requirements for applications:what RDF datatypes are usedwhat serialization syntax(es) must be supportedhow to access reasonable chunks of information (paging)how to manage collections of RDF datawhat vocabulary items to use for metadataetc.Linked Data Platform WG(#)Has just started a few days ago!Status(#)Linked Enterprise Data Patterns, Tuesday, 11:30amDavid Wood (3 Round Stones, Inc.), Arnaud Le Hors (IBM), Ashok Malhotra (Oracle)LDP Working Group BOF: Tuesday, 12:30pm, Franciscan CAt the conference

(#)What else may be on the horizon?(#)Knowledge vs. data ratio is different: very shallow, simple vocabularies for huge sets of datathe role of reasoning is different (vocabularies, OWL DL, etc., may not be feasible)Not enough links among datasetslots of work on creating further linksScale: billions of triples, increasing every daysetting a SPARQL endpoint everywhere may not be realisticHighly distributeddata may not be in one single database, even within the same organization

Some challenges raised by Linked Data(#)Check on the quality of the published dataReconsider rule languages for (e.g., for Linked Data applications)Relationship to JSONConstraint checking of DataAPI-s for client-side Web Application Developers

Possible future works in the activity(#)Issues around internationalization of Semantic Web technologiesRelationship between Semantic Web technologies and Big Data, Cloud Storage and Computing,Specific standard vocabularies (e.g., data annotation, governmental vocabularies)some of these may be defined at W3C, some elsewherePossible future works in the activity(#)Lot remain to be doneLots of issues to be solvedBut W3C needs experts! consider joining W3C, as well as the work done there!

(#)Enjoy The Conference!(#)These slides are also available on the Web:

http://www.w3.org/2012/Talks/0605-SemTech-IH/

Thank you for your attention(#)83

Documents

Semantic Web Activities @ W3C