35
© 2008 OpenLink Software, All rights reserved Open Conceptual Data Models Making the Conceptual Layer Real via HTTP based Linked Data (aka. Linked Data)

Making the Conceptual Layer Real via HTTP based Linked Data

Embed Size (px)

DESCRIPTION

A presentation that addresses pros and cons associated with approaches to making concrete conceptual models real. It covers HTTP based Linked Data and RDF data model as new mechanism for conceptual model oriented data access and integration.

Citation preview

Page 1: Making the Conceptual Layer Real via HTTP based Linked Data

© 2008 OpenLink Software, All rights reserved

Open Conceptual Data Models

Making the Conceptual Layer Real

via

HTTP based

Linked Data (aka. Linked Data)

Page 2: Making the Conceptual Layer Real via HTTP based Linked Data

© 2008 OpenLink Software, All rights reserved

Situation Analysis

Linked Data Vision: The transition of the HTTP based Webs (Intranet, Extranet,

or Internet) from a Webs of Linked Documents to Webs of interlinked Structured Data Items

(aka: entities, data objects, resources)

Concurrent trend in the IT industry: A recognition of the benefits of conceptual data models

over logical data models

The Big Question: To what extent does Linked Data support conceptual level

data models ?

Page 3: Making the Conceptual Layer Real via HTTP based Linked Data

© 2008 OpenLink Software, All rights reserved

Open Conceptual Data Models

Topics:

Conceptual & Logical Data Models

Conceptual Models for the Semantic Web

Realizing Conceptual Models through Ontologies & Linked Data

Virtuoso’s RDF based Linked Data Views

ADO.NET Data Services & the Entity Data Model

Page 4: Making the Conceptual Layer Real via HTTP based Linked Data

© 2008 OpenLink Software, All rights reserved

Data Model Layers

Physical How data is physically represented on disk

Logical (aka logical schema) Expresses problem domain in terms of data

management technology (tables / columns) e.g. relational schema

Conceptual (aka conceptual schema) Purely semantic description of problem space Describes things (entities), their characteristics

(attributes) & associations between things (relationships)

Page 5: Making the Conceptual Layer Real via HTTP based Linked Data

© 2008 OpenLink Software, All rights reserved

Logical Data Model

Most prominent of the three data model types

Main focus of database driven applications Due to pervasiveness of relational database driven

applications within the enterprise and across the Web

Weaknesses Impedance mismatch

Loss of semantics during development process

Heterogeneous databases & interoperability

Page 6: Making the Conceptual Layer Real via HTTP based Linked Data

© 2008 OpenLink Software, All rights reserved

Logical Data Model Weaknesses

Impedance Mismatch SQL expresses queries in terms of tables / views

=> targets logical schema Normalization fragments the data model

Entities & their attributes may be split across several tables

Navigation between objects requires relational joins over two or more tables

Table rows must be reconstituted into higher level conceptual entities

Conceptual level data model is desirable to: Remove impedance mismatch Isolate application from changes to logical data model Provide framework for productive human level interaction

Page 7: Making the Conceptual Layer Real via HTTP based Linked Data

© 2008 OpenLink Software, All rights reserved

Logical Data Model Weaknesses

Loss of Semantic Fidelity During Development

Process: Develop conceptual model (E-R modelling) Transform to logical model for implementation DBMS generates physical model

Problems: Each move to a lower level model depreciates semantic

fidelity of the higher level model Conceptual Model semantics fragmented across schema /

business rules / application code Application & Users must understand logical data model

Must be hardcoded or inferred (imperfectly) from system tables

Page 8: Making the Conceptual Layer Real via HTTP based Linked Data

© 2008 OpenLink Software, All rights reserved

Logical Data Model Weaknesses

Heterogeneous Databases & Interoperability

Logical data model Describes problem domain in terms of tables/columns Requires costly table joins to navigate model

Application Exposed to specifics of a particular vendor’s RDBMSIn heterogeneous database environment, must handle Different SQL dialects Different schemas

No explicit data model. No explicit semantics. Interoperability/integration = perpetual problem for IT depts

Page 9: Making the Conceptual Layer Real via HTTP based Linked Data

© 2008 OpenLink Software, All rights reserved

Conceptual Models for Linked Data Webs

Explosion of User Generated Data from Web 2.0 applications and their Data Silos is driving the recognition of the need to

move from logical to conceptual models, exemplified by:

Microsoft’s Entity Data Model / Entity Framework

W3C’s Semantic Web Project which includes powerful technologies for this paradigm shift such as:

Resource Description Framework (RDF Data Model and Data Representation Formats)

Web Ontology Language (OWL)

SPARQL (Query Language, RESTful Interface, and Query Result Serialization Formats)

Page 10: Making the Conceptual Layer Real via HTTP based Linked Data

© 2008 OpenLink Software, All rights reserved

Benefits of Conceptual Models

More faithfully represents human view of domain of interest

Conceptual model & semantics Explicit & available globallyNot implicit & fragmented across business logic /

UI etc Better / explicit semantics facilitates move from

“search” to “esoteric precision find” Much easier heterogeneous data integration

User Generated Data is inherently heterogeneous & disparately located

Page 11: Making the Conceptual Layer Real via HTTP based Linked Data

© 2008 OpenLink Software, All rights reserved

Application Areas – Present & Future

Social Media, eCommerce, Distributed Collaborative Apps. Require shareable, standards-based, cross-platform

conceptual views of data Data portability

Needed as users maintain multiple points of presence & identity across – blogs, social network accounts etc.

Open business models Require exchange & integration of large amounts of data

Scientific research – sharing of knowledge & findings Requires transparent access to distributed

heterogeneous data Requires database integration using global schema

Autonomous intelligent agents Free humans from large-volume data processing

Page 12: Making the Conceptual Layer Real via HTTP based Linked Data

© 2008 OpenLink Software, All rights reserved

Semantic Web Project TechnologiesThese technologies offer:

Ontologies For representing common semantics

Spanning databases, applications, enterprises, on-line communities Deliver shared conceptual model Provide common schemas (Dublin Core, FOAF, SIOC, GoodRelations

etc)

Common Semantics (Ontologies) & Common Data Representation (RDF) Enable cross data source querying using SPARQL

Data across several databases (or data spaces) can be meshed, expanded, and explored

Querying using proprietary APIs unnecessary Brute force data merging via code is unnecessary

Open Data Formats, Platform Independence, Common Models Facilitate data portability, accessibility, and integration.

Page 13: Making the Conceptual Layer Real via HTTP based Linked Data

© 2008 OpenLink Software, All rights reserved

Realizing Conceptual Models

Ontologies Provide the building blocks for conceptual models Define the concepts and their relationships in a domain of

interest (or world view)

Describing Classes & Properties – Ontology Languages RDFS

Introduces the notions of concepts (classes) & instances OWL

Adds more vocabulary for describing: relations between classescardinality richer typing of properties, etc.

Page 14: Making the Conceptual Layer Real via HTTP based Linked Data

© 2008 OpenLink Software, All rights reserved

Goodness of Fit

RDF was designed from the ground up as a metadata data model

RDF / RDFS / OWL work directly at the level of conceptual models

Conceptual model terminology matches RDF/OWL terminology Concepts, entities, attributes, relationships.

A natural fit!RDF lends itself naturally to describing conceptual models.

Page 15: Making the Conceptual Layer Real via HTTP based Linked Data

© 2008 OpenLink Software, All rights reserved

Semantic Expressivity Comparison

Data Definition Language (DDL)-based Relational Model Relationship between two entities isn’t explicit Foreign key relating two rows in separate tables doesn’t

express the nature of the relationship Semantics must often be inferred from table definitions

RDF-based Conceptual Model Relationship between two entities is stated explicitly by

predicate in subject-predicate-object triple Semantic expressivity of RDF/RDFS/OWL is much better

than DDL Has richer semantic content than equivalent DDL-based

logical/relational model

Page 16: Making the Conceptual Layer Real via HTTP based Linked Data

© 2008 OpenLink Software, All rights reserved

RDF Conceptual Model – Artist / Records / Tracks

Page 17: Making the Conceptual Layer Real via HTTP based Linked Data

© 2008 OpenLink Software, All rights reserved

Global Granular Information Sharing

Traditional Logical/Relational Data Model Schema described by DDL is internal to DBMS Primary keys identifying an individual table row

(i.e. entity instance) not globally unique, not easily usable outside host DBMS

Gives rise to ‘data silos’

RDF’s use of Generic HTTP-based URIs Externalises the data and schema Makes both globally accessible & scalable Provides globally unique IDs for entities/relations/classes A vehicle for granular, global information sharing down to

the equivalent of the record level.

Page 18: Making the Conceptual Layer Real via HTTP based Linked Data

© 2008 OpenLink Software, All rights reserved

Linked Data – What is It?

A method for exposing, sharing & connecting data on HTTP based Data Networks.

A term coined by Tim Berners-Lee that describes a RESTful mechanism for HTTP based Data Access & Manipulation by Reference

A record level HTTP based Open Data Access & Connectivity mechanism

A richer hyperlinking mechanism that takes us from Hypertext Links (Document to Document) to Hyperdata Links (Data Item to Data Item).

Page 19: Making the Conceptual Layer Real via HTTP based Linked Data

© 2008 OpenLink Software, All rights reserved

Linked Data – Why Is It Important

It exposes the compound nature of Data Containers (e.g., Documents) such thatData Containers are uniquely identified &

referenceableData Items within Data Containers are uniquely

identified & referencable It provides a conceptual model oriented Open Data

Access & Connectivity mechanism It delivers a powerful mechanism for meshing

disparate and heterogeneous data sources.

Page 20: Making the Conceptual Layer Real via HTTP based Linked Data

© 2008 OpenLink Software, All rights reserved

Linked Data Model

Changes the focus from linked documents to linked entitiesThe document as a data container becomes less relevant

Page 21: Making the Conceptual Layer Real via HTTP based Linked Data

© 2008 OpenLink Software, All rights reserved

Hyperdata Links Between Data Items

Page 22: Making the Conceptual Layer Real via HTTP based Linked Data

© 2008 OpenLink Software, All rights reserved

Linked Data Benefits – Data Exploration

Natural Navigation Through Typed Links RDF entities (instance data, classes, and properties) are

identified by dereferencable HTTP URIs Navigating from one data item to another is easy via:

Single LINK click from any HTTP user agent commences data item relationship navigation

Linked Data Browers such as OpenLink Data Explorer

Relational/Logical Model Cumbersome Requires SQL joins + typically Object-Relational mapping e.g. in C# : track = lennonAlbum.Tracks[“Imagine”]

Page 23: Making the Conceptual Layer Real via HTTP based Linked Data

© 2008 OpenLink Software, All rights reserved

Linked Data Benefits - Aggregatable Data

Often desirable to have an integrated view of all the data available about an item or topic

Database Realm Integration problematic, difficult to combine logical schemas

Semantic Web Data aggregation is easy: every resource has a unique URI

Individual items can be linked Conceptual models can be linked

Cross-domain links enrich domain knowledge Different facets of the same data item may be described by

different URIs minted by different authors Can be linked. e.g. owl:sameAs, rdf:type predicates May expose facts not directly represented in any one

source

Page 24: Making the Conceptual Layer Real via HTTP based Linked Data

© 2008 OpenLink Software, All rights reserved

Linked Data – Data Aggregation

Page 25: Making the Conceptual Layer Real via HTTP based Linked Data

© 2008 OpenLink Software, All rights reserved

Linked Data Benefits - Self Describing Data

Resource Description Framework (RDF) A technology for creating self-describing Web resources Data Item’s type definition ‘accompanies’ it via rdfs:type

relations An RDF based data can be queried using SPARQL without

knowing anything beforehand about the data definition (schema comes last in this realm)

Provides the basis for powerful deductive data exploration tools

Logical / Relational Schema Users / applications need a detailed understanding of the

schema to use and navigate the data Application’s knowledge of the schema typically hardcoded Ad-hoc end-user data exploration potentially error prone

Page 26: Making the Conceptual Layer Real via HTTP based Linked Data

© 2008 OpenLink Software, All rights reserved

Linked Data Benefits - SPARQL

If a user agent has no built-in knowledge of a particular Data Item, it can dereference its Generic HTTP URI to obtain

such information

The Power of SPARQLDiscover what sorts of things a data source contains select distinct ?URI ?ObjectType where { ?URI a ?

ObjectType }

Determine all the properties of an data item’s class select * where { <http://my.org/resourceTypes/Department>

?property ?hasValue }

Determine all the properties and values of an data item instance

DESCRIBE <http://my.org/resource/Accounts>

No prior knowledge of the RDF data source is needed

Page 27: Making the Conceptual Layer Real via HTTP based Linked Data

© 2008 OpenLink Software, All rights reserved

Virtuoso - Linked Data Generation OptionsConceptual layer insulates Linked Data consumers from RDFization infrastructure & data source heterogeneity

Page 28: Making the Conceptual Layer Real via HTTP based Linked Data

© 2008 OpenLink Software, All rights reserved

Virtuoso RDF based Linked Data Views

Expose relational model data as RDF graph model data Provide the means to move from a logical model to a

conceptual model view

Available for querying through SPARQL or SPASQL (SPARQL embedded in SQL)

No physical regeneration of relational data

RDF Views = Virtuoso RDF Meta-Schema (MSL) + Meta-Schema Language

MSL = A domain specific, declarative language for mapping a

logical SQL data model to a conceptual RDF data model

Page 29: Making the Conceptual Layer Real via HTTP based Linked Data

© 2008 OpenLink Software, All rights reserved

Northwind Demo Database:RDF View Definition Extract

Customer ID

Company Name

Contact Name

Contact

Title

Address City Postal Code

Country Phone Fax

prefix northwind: <http://www.openlinksw.com/schemas/northwind#>

create iri class northwind:Customer <http://^{URIQADefaultHost}^/Northwind/Customer/%U#this> (in customer_id varchar not null)

alter quad storage virtrdf:DefaultQuadStorage

from Demo.demo.Customers as customers

from Demo.demo.Orders as orders … {

Demo.demo.Customers

Northwind RDF View Definition

create virtrdf:NorthwindDemo as graph iri (“http://^{URIQADefaultHost}^/Northwind”) {

northwind:Customer(customers.CustomerID) a foaf:Organization as virtrdf:Customer-CustomerID ;

northwind:companyName customers.CompanyName as … ;

northwind:fax customers.Fax as virtrdf:Customer-fax .

} }

northwind:Customer(orders.CustomerID)northwind:has_order northwind:Order(orders.OrderID) as virtrdf:Order-has_order .

Page 30: Making the Conceptual Layer Real via HTTP based Linked Data

© 2008 OpenLink Software, All rights reserved

Northwind Demo Database:Customer Table to RDF data item Mapping

Customer ID

Company Name

Contact Name

Contact

Title

Address City Postal Code

Country Phone Fax

ALFKI Alfreds Futterkiste

Maria Anders

Sales Represe-ntative

Obere Str. 57

Berlin 12209 Germany 030 - 0074321

030 - 0076545

companyName

contactName

contactTitle

address city

PostalCode

country

phone

fax

AlfredsFutterkiste

MariaAnders

SalesRepresentative

Obere Str. 57

Berlin

12209

Germany

030-0074321

030-0076545

Order/10643#this

has_order

Order/10692#this

has_order

Customer/ALFKI#this

prefix <http://demo.openlinksw.com/Northwind/> has_customer

has_customer

Order ID

CustomerID

10643 ALFKI …

10692 ALFKI …

Orders Table

Page 31: Making the Conceptual Layer Real via HTTP based Linked Data

© 2008 OpenLink Software, All rights reserved

LinqToRdf + Virtuoso

Page 32: Making the Conceptual Layer Real via HTTP based Linked Data

© 2008 OpenLink Software, All rights reserved

LinqToRdf to MusicBrainz - Conceptual Model Veneer

Page 33: Making the Conceptual Layer Real via HTTP based Linked Data

© 2008 OpenLink Software, All rights reserved

ADO.NET Data Services & Entity Data Model

A framework for exposing ‘pure data’ service over HTTP

No support for RDF Fails to imbibe any of RDF’s inherent benefits

Lack of platform independence & standards compliance Supports REST-style interfaces Supports Atom, JSON and XML payloadsBut Server-side: Windows only Consuming Astoria services at a higher level requires

Windows .NET client or Silverlight-supported browser

Page 34: Making the Conceptual Layer Real via HTTP based Linked Data

© 2008 OpenLink Software, All rights reserved

ADO.NET Data Services & Entity Data Model

Server-side only conceptual model Powerful URL addressing to query/navigate/sort/filter etc

Customers collection:http://myserver/data.svc/Customers

Customer ALFKI: http://myserver/data.svc/Customers('ALFKI')

Customer ALFKI's orders: http://myserver/data.svc/Customers('ALFKI')/Orders

But Client must know conceptual schema

e.g. to construct above URIs

Lack of Deferencable Entity IDs Ability to discover entities and dereference their

descriptions (attributes/relations) is confined to the facilities offered by .NET

c.f. SPARQL’s ability to handle unknown data sources

Page 35: Making the Conceptual Layer Real via HTTP based Linked Data

© 2008 OpenLink Software, All rights reserved

ADO.NET Data Services & Entity Data Model

No Support for Non-SQL Data Sources Astoria is aimed exclusively at making relational data Web

accessible

c.f. Linked Data Realm Recognize that vast amounts of data resides in unstructured

and semi-structured data sources Support for embedding RDF into existing (X)HTML

RDFa, GRDDL, eRDF Emerging tools for converting non-RDF data to RDF model

data Emerging tools for exposing Relational data as RDF Graph

Model data

Astoria lacks scalability & scope of Semantic Web technologies