Semantic Software Architecture
Using Semantic Technology to Build the Enterprise Information Web
Michael [email protected]
2
Emergent Analytics
Extensible enterprise information management paradigm
Add semantics to all aspects of the enterprise's information systems
All information becomes easily accessible using SPARQL
Add new information easily Understand how everything is related and what it is
Provides the capability to analyze information enterprise wide
3
IT
4
Information Technology
The technology that enables the management of all types of information
Create it – works great Store it – works great Change it – works great Find it – not so good Analyze it – very complex, very difficult Use it – works great if you are inside the application
that creates it, otherwise BIG problem Commonly called SILOS
We all want FEDERATION
5
The term “Semantic Web” will not appear in this presentation
6
Semantic
Technology
IT
7
New Information Management Paradigm
Semantic Technology is a layer of description that sits within the current IT infrastructure
We build the descriptions using OWL and RDF We access the descriptions at run-time using
SPARQL OWL and RDF are unique because they are a
description language and an information model that has its own unique aspects
Enables a radical transformation of IT capabilities Completely distributed information management FEDERATION
8
Information Federation Enterprises are made up of many domains within domains
Sales, Operations, R&D, Executive management, manufacturing, … Logistics, HR, Finance, intelligence …
Each domain fields its own applications and creates its own information to execute its mission
It is normally not possible to federate and integrate applications within domains, across domains or with partners
Enterprises will not take the next step in analytic capability until they first solve the INFORMATION federation problem
9
What are RDF and OWL for?
They are only used for one thing....
To DESCRIBE things
ANYTHING
Machines canUNDERSTANDthe descriptions
10
Federation Requires Description
Information discovery, reuse, and integration all depend on description
If we do not know what something is we cannot possibly know how to integrate it with other things or even how it should be used
If we describe everything well enough, we are in a position to have a knowledge-based web
integrate and interoperate Analyze any combination of information
RDF & OWL enable information federation both machines and people can understand the descriptions
11
Defense Advanced Research Projects Agency
Relational Database Technology TCP/IP OWL/RDF
DARPA creates the Defense Agent Markup Language program in 2000 to facilitate information federation - DAML.org
W3C takes the work funded by DARPA and others to create the Resource Description Framework (RDF) and Ontology Web Language (OWL) specifications
These standards comprise an excellent information management technology architecture
There are no other standards that can be used to accomplish the goal of information federation
12
World Wide Web Architecture
Mature
Active Researchand StandardsActivity
CommercialCuttingEdge
13
Semantic Software Architecture
14
Semantic Software Architecture
All components support RDF, OWL and/or SPARQL as well as other web technologies
OWL modeling tools RDF stores Spyders Federators SPARQL endpoints Visualization tools Analytic tools SPARQL endpoint registry
15
SpyderSoftware component that transforms relational data formats to RDF using the mapping ontology
Adds the semantics of any domain ontology to any database
Provides SPARQL endpoint for relational databases
Generates information about sources to optimize performance
exposes full power of SQL allows mappings themselves to be analyzed Minimizes or eliminates the need for triple stores Easier to use than ETL REUSEABLE
16
Federator
Enables users to query multiple RDF graphs exposed by Spyders as if they were a single graph
Uses the source metadata provided by Spyders to optimize performance
Works against the native information sources Does not require RDF to be moved into a triple store before it is
queried Delegates the maximum amount of processing as far down as
possible
Better solution than traditional ETL based processes Uses the domain ontology and mapping ontology
Supports complex analytics Integrated with rules engine
Spyder
OptimizerIndexes
PlannerRe-Writer
SPARQL Endpoint
Federator
OptimizerCache Indexes
PlannerRe-WriterRules
SPARQL Endpoint
Federator
OptimizerCache Indexes
PlannerRe-WriterRules
SPARQL Endpoint
Data Source
Spyder
OptimizerIndexes
PlannerRe-Writer
SPARQL Endpoint
Spyder
OptimizerIndexes
PlannerRe-Writer
SPARQL Endpoint
Mapping Ontology Mapping Ontology
Metadata Ontology Metadata Ontology
Domain OntologySPARQL
Endpoint Registry
Dashboard
SPARQL
SPARQL SPARQL
SQL SQL
Data Source
Ontology Repository
Federator
OptimizerCache Indexes
PlannerRe-WriterRules
SPARQL Endpoint
Data Source
Spyder
OptimizerIndexes
PlannerRe-Writer
SPARQL Endpoint
Spyder
OptimizerIndexes
PlannerRe-Writer
SPARQL Endpoint Mapping Ontology
Metadata Ontology
Domain Ontology
SPARQL Endpoint Registry
Dashboard
SPARQL
SPARQL SPARQL
SQL SQL
Data Source
SPARQL
SPARQL
SPARQL
21
Ontology Architecture
22
Ontology Architecture
An ontology architecture is the system of ontologies required to accomplish a goal
Very much like a software architecture
The goal for an EIW is federation of information sources across business units to enable enterprise reporting and analysis
The ontology architecture of an EIW is designed to solve the information federation problem
While enabling sophisticated analytics
23
EIW Ontology Architecture
Human ResourcesDomain Ontology
Relational MappingOntology
Relational MappingOntology
Process Ontology
RDBMS RDBMS
Standards Ontology
AnalyticsOntology
SourceOntology
SourceOntology
Discussion Ontology
Community Ontology
Top-down
Bottom-up
24
EIW Ontology Architecture for Federation
Human ResourcesDomain Ontology
Relational MappingOntology
Relational MappingOntology
RDBMS RDBMS
Reporting/Analytics
SPARQL
SourceOntology
SourceOntology
The Federator
25
Domain Ontology
The Domain Ontology is a conceptual description of a business domain
The “domain” is defined by the business processes, rules, information sources, and any required analytics
Instances in this ontology are the same instances which are currently stored in information sources (databases)
Exposes all information of the domain to any user or application using the business terminology of the domain
in some cases, these business terms are defined by standards
26
Relational Mapping Ontology Describes how concepts in the domain ontology relate
to data in databases Enables the translation of data from a relational format
to RDF format, using terminology defined in the Domain Ontology
We have created a document that defines the Relational Mapping Ontology
This document should be released to the public this year The D2RQ language was not sufficient for our mission
http://www.knoodl.com/ui/groups/Mapping_Ontology_Community
27
Relational Schema Ontology Represents metadata about a relational database
schema as instance data All columns are instances and have properties relating them to
their tables
Enables analysis of the way a database is mapped to the Domain Ontology (via the Relational Mapping Ontology)
How many columns are mapped to properties in the Domain Ontology?
How many are mapped to classes? How is Person represented in customer management system?
28
Analytics Ontology Enables us to describe questions, queries, reports, forms
we represent questions as instances and relate them to the queries that provide their answers
Queries are related to Domain Ontology concepts Domain Ontology concepts are mapped to data sources Enables "gap analysis" of analytic requirements
are the concepts used in the query to answer this question mapped to the necessary data sources?
Long-term can be used to model-drive a reporting tool create instances of "Reports" and the tool builds them
29
Process Ontology Enables description of business processes
RDF/OWL version of BPMN Enables analysis of the information flows of business
process steps in terms of the HR Domain Ontology Long-term will enable execution of processes described
as instances of the ontology Short-term enables us to link processes with other
artifacts in the domain Domain Ontology concepts Standards documentation Discussions - anything
30
How Hard is this? Many people believe that it is too hard, not enough trained
people and takes too long to build the descriptions So millions of dollars and many years have been spent trying to
develop an automated way of doing the modeling Automated machine learning has not been invented The machines must be bootstrapped with descriptions
The first bullet is a fallacy It is not very hard There are plenty of people that can do this work It does not take very long to build the models
31
Federation SolutionEnterprise Information Web
Any information from any system can be shared with any other system on the enterprise networks or the World Wide Web
Steps Describe all of the terms and artifacts in each domain using RDF, OWL
We currently do this description work, but we do not use machine readable standards – Excel, Word, Powerpoint, Visio
The formal description of a domain is called a domain ontology Describe how all of the information managed in each domain is related to
the domain vocabularyUse these descriptions to say how domains are related
Query the Domain vocabularies for any informationThe result is an Enterprise Information Web that meets the goals of information sharing and analysis
32
Relational DB’s
FinanceHR
Logistics
Web Service
Domain DescriptionsKnowledgebase
Web sites
Applications
1. Information Systems
2. Expose as RDF web services or SPARQL
endpoints
3. EIW contains self described data
4. ESB is a big federated knowledgebase of any
information
user
5. Any authorized user or system can query the ESB for any information
Enterprise Information Web
RDF Web Service
sensors
Web Service
weather
location
Federator Web Service
Enterprise Information Web
33
Leverage Existing Investment
We leverage existing infrastructure Same networks, same security, same applications,
same organizations
A lot of this description work is being done now, it simply requires some redirection
Must use standards like any other federation
The result of this relatively minor change and expense is an astounding advance in information management capability
34
EIW DemoCommunity Content
SecurityDiscussionsOWL editingASK queries
View Designer/Views
35
Visual Ontology Web Language
36
VisualizationThere is no adopted standard by W3C for visual representation of OWL or RDF models
OWL and RDF will not become a widely used standards without good visualization of models
We do not believe any existing modeling standard will do, OWL is too different
We need OWL design patterns to fundamentally change information management capability at DOD and elsewhere
The capability will be in beta test in December on knoodl.com