27
Revelytix SICoP Revelytix SICoP Presentation Presentation DRM 3.0 with WordNet Senses in a DRM 3.0 with WordNet Senses in a Semantic Wiki Semantic Wiki Michael Lang Michael Lang February 6, 2007 February 6, 2007

Revelytix SICoP Presentation DRM 3.0 with WordNet Senses in a Semantic Wiki Michael Lang February 6, 2007

Embed Size (px)

Citation preview

Page 1: Revelytix SICoP Presentation DRM 3.0 with WordNet Senses in a Semantic Wiki Michael Lang February 6, 2007

Revelytix SICoP Revelytix SICoP PresentationPresentation

DRM 3.0 with WordNet Senses in a DRM 3.0 with WordNet Senses in a Semantic WikiSemantic Wiki

Michael LangMichael Lang

February 6, 2007February 6, 2007

Page 2: Revelytix SICoP Presentation DRM 3.0 with WordNet Senses in a Semantic Wiki Michael Lang February 6, 2007

AgendaAgenda

► Semantic Matching using WordnetSemantic Matching using Wordnet

► Bootstrapping COI based vocabulariesBootstrapping COI based vocabularies

With WordNetWith WordNet

► DRM 3.0 in a semantic WikiDRM 3.0 in a semantic Wiki

With WordNet integrationWith WordNet integration

DRM implementation tool for the agenciesDRM implementation tool for the agencies

► DemonstrationDemonstration

Page 3: Revelytix SICoP Presentation DRM 3.0 with WordNet Senses in a Semantic Wiki Michael Lang February 6, 2007

DRM MissionDRM Mission

► Facilitate information sharingFacilitate information sharing► How can I know that a data element or How can I know that a data element or

service I have discovered is the one I really service I have discovered is the one I really want?want? DescriptionDescription ContextContext

► How can I describe and provide sufficient How can I describe and provide sufficient context for anyone to know they have found context for anyone to know they have found what they wantwhat they want

► Knowledge modelKnowledge model Excel, ISO 11179, DRM 2.0 will not be sufficientExcel, ISO 11179, DRM 2.0 will not be sufficient OWLOWL

Page 4: Revelytix SICoP Presentation DRM 3.0 with WordNet Senses in a Semantic Wiki Michael Lang February 6, 2007

Semantic MatchingSemantic Matching

Page 5: Revelytix SICoP Presentation DRM 3.0 with WordNet Senses in a Semantic Wiki Michael Lang February 6, 2007

Semantic Matching Semantic Matching

► MatchITMatchIT Extracts terms from data basesExtracts terms from data bases

Creates a MatchIT vocabulary based on the collection of Creates a MatchIT vocabulary based on the collection of termsterms

Uses WordNet to match terms in disparate systemsUses WordNet to match terms in disparate systems

Uses WordNet to match terms to domain vocabularies (NIEM)Uses WordNet to match terms to domain vocabularies (NIEM)

Attaches WordNet “senses” to the vocabulary termsAttaches WordNet “senses” to the vocabulary terms

► MatchIT vocabularies can be exported as OWL modelsMatchIT vocabularies can be exported as OWL models With the WordNet senses and synsetsWith the WordNet senses and synsets

► MatchIT can use other knowledgebases to facilitate MatchIT can use other knowledgebases to facilitate matchingmatching

Page 6: Revelytix SICoP Presentation DRM 3.0 with WordNet Senses in a Semantic Wiki Michael Lang February 6, 2007

Bootstrapping COI Bootstrapping COI VocabulariesVocabularies

►MatchIT vocabularies are importedMatchIT vocabularies are imported Either as OWL classes for vocabulary Either as OWL classes for vocabulary

developmentdevelopment

Or as OWL individuals for DRM developmentOr as OWL individuals for DRM development

►These vocabularies can enriched These vocabularies can enriched using Knoodl.com for community using Knoodl.com for community based developmentbased development Data dictionaryData dictionary VocabularyVocabulary Knowledge baseKnowledge base

Page 7: Revelytix SICoP Presentation DRM 3.0 with WordNet Senses in a Semantic Wiki Michael Lang February 6, 2007

Conceptual / Logical / Physical Data Models

Relational

XML

XML

XML

XML Ontologies[OWL/RDF]

Domain[UML/ER]

Data Harmonization Complete

MetadataAccess

Data/ContentAccess

Ontological Semantics Access

OWL / RDF Model Complete

Import Export

Representations

Find Matches

Ontological Semantics Access

Enterprise Information Sources

Custom

AnySource

XML

FileSystem

JDBC

RDMS

Semantic Ontology Platform

Fact RepositoriesFact Repositories

OnomasticonsOnomasticons

LexiconsLexicons

DomainOntology

Models & Files[versioned]

Models & Files[versioned]

Search Index

Search Index

Web Reporting

Web Reporting

Instance-level

Match

Instance-level

Match

Schema-level

Match

Schema-level

Match

Build

Knoodl.com

Third-Party Modeling Tool

MatchIT Vocabulary MatchIT Vocabulary ManagerManager

Page 8: Revelytix SICoP Presentation DRM 3.0 with WordNet Senses in a Semantic Wiki Michael Lang February 6, 2007

Collaborative OWL Collaborative OWL editoreditor

Page 9: Revelytix SICoP Presentation DRM 3.0 with WordNet Senses in a Semantic Wiki Michael Lang February 6, 2007

Information ManagementInformation Management

►Knoodl is a new kind of modeling tool Knoodl is a new kind of modeling tool for modeling the structure, semantics for modeling the structure, semantics and knowledge of any domainand knowledge of any domain The modeling process is necessarily The modeling process is necessarily

collaborativecollaborative The process is necessarily extensible and The process is necessarily extensible and

additiveadditive Community of Interest (COI) based toolCommunity of Interest (COI) based tool OWL basedOWL based

Page 10: Revelytix SICoP Presentation DRM 3.0 with WordNet Senses in a Semantic Wiki Michael Lang February 6, 2007

Knoodl.com is …Knoodl.com is …► An internet application where people can An internet application where people can

collaborate with others in their communities of collaborate with others in their communities of interest tointerest to Create, edit, share and find Create, edit, share and find Vocabularies / ontologiesVocabularies / ontologies

► OWL RepositoryOWL Repository Free, but licensing controlled by COI’sFree, but licensing controlled by COI’s

► Social Computing ParadigmSocial Computing Paradigm Users contribute content and benefit from the contentUsers contribute content and benefit from the content Vocabularies capture much of the institutional Vocabularies capture much of the institutional

knowledge of an enterprise or communityknowledge of an enterprise or community Gain value over timeGain value over time Used by people and machinesUsed by people and machines

Page 11: Revelytix SICoP Presentation DRM 3.0 with WordNet Senses in a Semantic Wiki Michael Lang February 6, 2007

Knoodl.comKnoodl.com

►Knoodl is a collaborative framework Knoodl is a collaborative framework ► Interoperability depends on three groups Interoperability depends on three groups

of stakeholders contributing to the of stakeholders contributing to the description and context of the servicesdescription and context of the services

►BusinesspeopleBusinesspeople►Technical peopleTechnical people►Data peopleData people

Knoodl provides the features for the Knoodl provides the features for the business people to participatebusiness people to participate

Page 12: Revelytix SICoP Presentation DRM 3.0 with WordNet Senses in a Semantic Wiki Michael Lang February 6, 2007

FEADRM PersonPerson Harmonization Workgroup

Data Architecture Subcommittee MeetingJanuary 11, 2007

Page 13: Revelytix SICoP Presentation DRM 3.0 with WordNet Senses in a Semantic Wiki Michael Lang February 6, 2007

Gathering Information

We asked those on the workgroup to share their models of PERSON with us.

We received documents from the Department of the Interior (DOI), the Veterans’ Administration (VA), the Federal Aviation Administration (FAA), and the Environmental Protection Agency (EPA).

You can view them on CORE.gov at https://collab.core.gov/CommunityBrowser.aspx?id=10833

Page 14: Revelytix SICoP Presentation DRM 3.0 with WordNet Senses in a Semantic Wiki Michael Lang February 6, 2007

Analyzing the Data

We compared the entities and attributes from all the documentation. We created an Excel Workbook.

– The first sheet contains all the entities and attributes from each model.

– The second sheet contains a mapping of the entities from the other agencies to those of the Social Security Administration (SSA)

– The third sheets contains the entities, attributes, and their definitions from the SSA FEADRM Model

The Excel document is named ‘Person Entities and Attributes from Various Feds’ and you can view it on CORE.gov at https://collab.core.gov/CommunityBrowser.aspx?id=11682

Page 15: Revelytix SICoP Presentation DRM 3.0 with WordNet Senses in a Semantic Wiki Michael Lang February 6, 2007

Observations

A data model should have a point of view, we should have a common one at the Federal level.

Everyone should be modeling business data rather than creating logical data base models.

PERSON is probably the area in which resides most of the non-administrative sharable data. This is what we at SSA call “common shared.”

The definition of business concepts represented by entities at the “top” of the data model should not be in terms so rigorously tied to the business of any one agency.

Data that are “regulated” require formal agreement to be sharable. PERSON cannot be addressed in a vacuum. The concepts of organization,

party, and role should be addressed at the same time.

Page 16: Revelytix SICoP Presentation DRM 3.0 with WordNet Senses in a Semantic Wiki Michael Lang February 6, 2007

DRM 3.0DRM 3.0

Page 17: Revelytix SICoP Presentation DRM 3.0 with WordNet Senses in a Semantic Wiki Michael Lang February 6, 2007

Communities of Interest (COI)Communities of Interest (COI)VisionVision

Each COI will implement the 3 pillar framework strategy.

Business &

Data Goals drive

Information Sharing/Exchange

(Services)

Governance

Data StrategyData Architecture

(Structure)

Page 18: Revelytix SICoP Presentation DRM 3.0 with WordNet Senses in a Semantic Wiki Michael Lang February 6, 2007

The FEA Data Reference Model 2.0The FEA Data Reference Model 2.0

Source: Expanding E-Government, Improved Service Delivery for the American People Using Information Technology, December 2005, pages 2-3.http://www.whitehouse.gov/omb/budintegration/expanding_egov_2005.pdf

NIEM 1.0 NIEM Roadmap Pilot

Page 19: Revelytix SICoP Presentation DRM 3.0 with WordNet Senses in a Semantic Wiki Michael Lang February 6, 2007

DRM 2.0 Implementation DRM 2.0 Implementation MetamodelMetamodel

► Definitions:Definitions: Metamodel: Precise Metamodel: Precise

definitions of constructs definitions of constructs and rules needed for and rules needed for abstraction, abstraction, generalization, and generalization, and semantic models.semantic models.

Model: Relationships Model: Relationships between the data and its between the data and its metadata - W3C.metadata - W3C.

Metadata: Data about Metadata: Data about the data for: Discovery, the data for: Discovery, Integration, and Integration, and Execution.Execution.

Data: Structured e.g. Data: Structured e.g. Table, Semi-Structured Table, Semi-Structured e.g. Email, and e.g. Email, and Unstructured e.g. Unstructured e.g. Paragraph.Paragraph.Source: Professor Andreas Tolk, 2005.

Page 20: Revelytix SICoP Presentation DRM 3.0 with WordNet Senses in a Semantic Wiki Michael Lang February 6, 2007

The Revelytix SolutionThe Revelytix Solution

►OWL MetaModel:OWL MetaModel:owl:Classowl:Classowl:Propertyowl:Property

►DRM Model:DRM Model:Topic (owl class)Topic (owl class)Entity (owl class)Entity (owl class)Relationship (owl object Relationship (owl object property)property)

• Use existing MetaModel languages to model Use existing MetaModel languages to model the FEA DRM – OWLthe FEA DRM – OWL Model the DRM in a collaborative environment - Model the DRM in a collaborative environment - KnoodlKnoodl Extend the DRM to model the type of Extend the DRM to model the type of information that will be created – JDBC information that will be created – JDBC metadata, Wordnet synset and word datametadata, Wordnet synset and word data

Page 21: Revelytix SICoP Presentation DRM 3.0 with WordNet Senses in a Semantic Wiki Michael Lang February 6, 2007

DRM Implementation:DRM Implementation:Data Description AreaData Description Area

► Model JDBC Metadata to Data Description AreaModel JDBC Metadata to Data Description Area

Entity<owl:Class>

Attribute<owl:Class>

DRM v2.0 Vocabulary

View<owl:Class>

Column<owl:Class>

Table<owl:Class>

MatchIT Data Dictionary Vocabulary

<owl:subClassOf>

<owl:subClassOf>

Relationship<owl:ObjectPropert

y>

ForeignKey<owl:ObjectProperty

>

<owl:subPropertyOf>

Page 22: Revelytix SICoP Presentation DRM 3.0 with WordNet Senses in a Semantic Wiki Michael Lang February 6, 2007

DRM Implementation:DRM Implementation:Data Context AreaData Context Area

► Model Wordnet data to Data Context AreaModel Wordnet data to Data Context Area

Relationship<owl:ObjectProperty

>

Topic<owl:Class>

DRM v2.0 Vocabulary

Hyponym<owl:ObjectProperty

>

Synset<owl:Class>

Hypernym<owl:ObjectPropert

y>MatchIT Data Dictionary Vocabulary

<owl:subPropertyOf>

<owl:subClassOf>

Taxonomy<owl:Class>

Wordnet<owl:Class>

<owl:subClassOf>

Page 23: Revelytix SICoP Presentation DRM 3.0 with WordNet Senses in a Semantic Wiki Michael Lang February 6, 2007

SICoP Knowledge Reference ModelSICoP Knowledge Reference Model

The point of this graph is that Increasing Metadata (from glossaries to ontologies) is highly correlated with Increasing Search Capability (from discovery to reasoning).

Page 24: Revelytix SICoP Presentation DRM 3.0 with WordNet Senses in a Semantic Wiki Michael Lang February 6, 2007

DemonstrationDemonstration

Page 25: Revelytix SICoP Presentation DRM 3.0 with WordNet Senses in a Semantic Wiki Michael Lang February 6, 2007

Contextualize Contextualize (Interpret)(Interpret)

Automated term tokenization

Automated semantic linking using the default knowledge-base contained within MatchIT

ArticleAmount

Amount Article

Sum

Assets

Creation

Synonym

Type-of

Page 26: Revelytix SICoP Presentation DRM 3.0 with WordNet Senses in a Semantic Wiki Michael Lang February 6, 2007

Semantic Matching Semantic Matching (Mediate)(Mediate)

► Relationships pre-established within the knowledge-base…Relationships pre-established within the knowledge-base…

Identify the Identify the TargetTarget and the and the Source(s)Source(s) and run the match. and run the match.

ArticleAmount

ProductShares

Automatically linked by a specific % distance

Page 27: Revelytix SICoP Presentation DRM 3.0 with WordNet Senses in a Semantic Wiki Michael Lang February 6, 2007

Semantic Matching Semantic Matching (Mediate)(Mediate)Not all direct matches are the most relevant…

In many cases the most valuable match are the distant matches.

By adding a domain knowledge-base these relationships become more obvious.

AbstractionEvidence