15
A Standard & Prototype Starting Point for An Open Ontology Repository: The Extended Metadata Registry Project John L. McCarthy XMDR Project Lawrence Berkeley National Laboratory Open Ontology Repository (OOR) Panel on Rationale, Expectations & Requirements March 27, 2008

A Standard & Prototype Starting Point for An Open Ontology Repository: The Extended Metadata Registry Project John L. McCarthy XMDR Project Lawrence Berkeley

Embed Size (px)

Citation preview

Page 1: A Standard & Prototype Starting Point for An Open Ontology Repository: The Extended Metadata Registry Project John L. McCarthy XMDR Project Lawrence Berkeley

A Standard & Prototype Starting Point for An Open Ontology Repository:

The Extended Metadata Registry Project

John L. McCarthyXMDR Project

Lawrence Berkeley National Laboratory

Open Ontology Repository (OOR) Panel onRationale, Expectations & Requirements

March 27, 2008

Page 2: A Standard & Prototype Starting Point for An Open Ontology Repository: The Extended Metadata Registry Project John L. McCarthy XMDR Project Lawrence Berkeley

page 2 of 15XMDR Open Ontology Talk-v5.ppt

Shared Goals & Challenges

• Open Ontology Repository Goals– collection of useful ontologies– help facilitate harmonization & synergy– standard representation/characterization?

• Extended Metadata Registry (XMDR) Project Goals– extend ISO-IEC 11179 ed. 2 Metadata Registry Standard

• for increasingly large & complex databases & software systems• particularly for large organizations like EPA, NCI, DOD, …

– incorporate & manage evolution of concept information• codesets of valid values, terminologies, thesauri, ontologies• using a shared metamodel for both metadata & concepts

Page 3: A Standard & Prototype Starting Point for An Open Ontology Repository: The Extended Metadata Registry Project John L. McCarthy XMDR Project Lawrence Berkeley

page 3 of 15XMDR Open Ontology Talk-v5.ppt

XMDR Project Overview & Background

• Set of collaborative initiatives with shared goals & funding– EPA, NCI, DOD, LBNL, USGS, Ecoterm, UNEP, … (major 11179 users)

• XMDR project at LBNL began in 2003

• principals have been meeting in Berkeley since 2004

– ISO-IEC JTC1/SC32/WG2 & ANSI L8 working on 11179 ed. 3• Joint Technical Committee 1, Subcommittee 32, Working Group 2• metadata registry standards work began in 1980’s re data dictionaries & codesets

• Open source reference implementation & testbed system– test implementations of proposed extensions to 11179 metamodel

• add more formal semantic metadata on concepts & relationships to data

– assemble semantic metadata from diverse sources & structures• terminologies, ontologies, etc. for environment, geography, health, …

– explore emerging semantic technologies (e.g., RDF, OWL, CL, …)

– demonstrate new capabilities • e.g., ontology lifecycle management & harmonization

Page 4: A Standard & Prototype Starting Point for An Open Ontology Repository: The Extended Metadata Registry Project John L. McCarthy XMDR Project Lawrence Berkeley

page 4 of 15XMDR Open Ontology Talk-v5.ppt

Challenge: Gain Common Understanding of meaning between Data Creators and Data Users

Users Information Systems Data Creation

UsersUsers

EEA

USGS

DoD

EPAenvironagricultureclimatehuman healthindustrytourismsoilwaterair

123345445670248591308

123345445670248591308

3268082513485038270800002178

3268082513485038270800002178

text data

environagricultureclimatehuman healthindustrytourismsoilwaterair

123345445670248591308

123345445670248591308

3268082513485038270800002178

3268082513485038270800002178

text

ambienteagriculturatiemposalud hunanoindustriaturismotierraaguaaero

123345445670248591308

123345445670248591308

3268082513485038270800002178

3268082513485038270800002178

text data

data

environagricultureclimatehuman healthindustrytourismsoilwaterair

123345445670248591308

123345445670248591308

3268082513485038270800002178

3268082513485038270800002178

text data

Others . . .

ambienteagriculturatiemposalud hunoindustriaturismotierraaguaaero

123345445670248591308

123345445670248591308

3268082513485038

3268082513485038270800002178

text data

Common interpretation of what data represents

Page 5: A Standard & Prototype Starting Point for An Open Ontology Repository: The Extended Metadata Registry Project John L. McCarthy XMDR Project Lawrence Berkeley

page 5 of 15XMDR Open Ontology Talk-v5.ppt

Inference requires combination of Data, Metadata & Concept Systems

ID Date Temp Hg

A 06-09-13 4.4 4

B 06-09-13 9.3 2

X 06-09-13 6.7 78

Name Datatype Definition Units

ID textMonitoring Station Identifier not applicable

Date date Date yy-mm-dd

Temp numberTemperature (to 0.1 degree C)

degrees Celcius

Hg numberMercury contamination

micrograms per liter

Inference Search Query: “find water bodies downstream from

Fletcher Creek where chemical contamination was over 10 micrograms per liter between December 2001 and March 2003”

Data:

Metadata:

Biological Radioactive

Contamination

lead cadmiummercury

Chemical

Concept System (multi-lingual):

Page 6: A Standard & Prototype Starting Point for An Open Ontology Repository: The Extended Metadata Registry Project John L. McCarthy XMDR Project Lawrence Berkeley

page 6 of 15XMDR Open Ontology Talk-v5.ppt

XMDR Goals (continued)

• Improve representation of relationships between data (e.g., data elements & value domains) and concept structures (e.g., ontologies, taxonomies, thesauri, terminologies, …)

• Register & manage complex semantic metadata (i.e., concepts) in more formal, systematic ways (e.g., description logic) to facilitate machine processing of semantics in order to– link together data elements & terms across multiple systems– discover relationships among data elements, terms & concepts– create and manage names, definitions, terms, etc.– support software inference, aggregation, and agent services

• Add more rigorous & formal specification for– concepts and concept systems (including ontologies)– relationships between metamodel components– formal axioms for conceptual & structural relationships

• Use concepts to unify different types of metadata– evolution requires increasing granularity & details– combine strengths of data dictionaries/registries and ontologies

Page 7: A Standard & Prototype Starting Point for An Open Ontology Repository: The Extended Metadata Registry Project John L. McCarthy XMDR Project Lawrence Berkeley

page 7 of 15XMDR Open Ontology Talk-v5.ppt

Example concept system content currently loaded in XMDR Prototype

via Lexgrid (from Mayo Clinic & Harold Solbrig)

• GEMET 2001.0 Multilingual Environmental Thesaurus • National Biological Information Infrastructure

biodiversity • NCI Thesaurus_06.02d health concepts system• ISO4217_1981 currency codes• ISO3166_V-10 country codes (only 2 letter codes)• Mouse_1.32 anatomy• Defense Technology Information Center 1.0 Thesaurus• Portions of EPA controlled vocabulary• SIC and NAICS industrial classification codes

via special purpose scripts• Omega ontology

Page 8: A Standard & Prototype Starting Point for An Open Ontology Repository: The Extended Metadata Registry Project John L. McCarthy XMDR Project Lawrence Berkeley

page 8 of 15XMDR Open Ontology Talk-v5.ppt

Additional candidate metadata content to test 11179 metamodel expressivity

Current 11179 Data Element Registries• caDSR (full NCI Cancer Data Standards Registry)• EDR (EPA Environmental Data Registry)

Candidate Additions to Concept Systems and Ontologies• NASA SWEET (Semantic Web Earth & Environmental Terminologies)• IETF RFC 3066 Language Codes• USGS Geographic Names Information System• Getty Thesaurus of Geographic Names• I.T.I.S. - Integrated Taxonomic Information System• Foundational Model of Anatomy • EPA Chemical Substance Registry • GO (Gene Ontology), ….Agrovoc, …and possibly others

• OMV Ontology Metadata Vocabulary (European NeON consortium & Stanford NCBO)

Page 9: A Standard & Prototype Starting Point for An Open Ontology Repository: The Extended Metadata Registry Project John L. McCarthy XMDR Project Lawrence Berkeley

page 9 of 15XMDR Open Ontology Talk-v5.ppt

Omega Ontology illustrates challenges of loading large, complex new content

Omega is a “terminological ontology” • reorganization & synthesis of WordNet &

Mikrokosmos• adds higher level ontology to organize multiple

ontologies

Initial mapping and loading of Omega needs to be refined

• Multiple ontology languages present an additional challenge• Entity relationships conform to Concept_System figure • Entity ->Attribute conforms to Classification_Scheme figure• Omega Attributes mapped to ISO/IEC11179 ed3 Facets

(ignoring Omega datatype field)

• Required a week to process and load Omega Ontology • 4 million files, so ~250,000/24 hrs

Page 10: A Standard & Prototype Starting Point for An Open Ontology Repository: The Extended Metadata Registry Project John L. McCarthy XMDR Project Lawrence Berkeley

page 10 of 15XMDR Open Ontology Talk-v5.ppt

XMDR Prototype Modular Architecture:with current open source software selections

Registry Store (Subversion)

Search & Inference Queries (Jena, SPARQL)

XMDR metamodel (OWL & xml schema) Full Text

Index

XMDR Prototype Architecture REST Style

standard XMDR filesstandard XMDR files

standard XMDR filesstandard XMDR files

AssertedLogicIndex

InferredLogicIndex

Content Loading & Transformation

(Lexgrid & custom)

Human User Interface(XML pages & javascript)

Metadata Sources concept systems,data elements

USERSWeb Browsers…..Client

Software

Application Program Interface (REST)

Authentication ServiceValidation

(XML Schema)

MappingEngine

Reasoner(Pellet)

Text Search(Lucene)

Metamodel specs(UML & Editing)

(Poseidon, Protege)

XMDR data model & exchange format

XML, RDF, OWL

Page 11: A Standard & Prototype Starting Point for An Open Ontology Repository: The Extended Metadata Registry Project John L. McCarthy XMDR Project Lawrence Berkeley

page 11 of 15XMDR Open Ontology Talk-v5.ppt

DRAFT 11179 – ed. 3 metamodel Consolidated Class Hierarchy

see xmdr.org wiki for more diagrams and details

Page 12: A Standard & Prototype Starting Point for An Open Ontology Repository: The Extended Metadata Registry Project John L. McCarthy XMDR Project Lawrence Berkeley

page 12 of 15XMDR Open Ontology Talk-v5.ppt

http://xmdr.org/

XMDR Prototype Web Site has downloadable code & content

Page 13: A Standard & Prototype Starting Point for An Open Ontology Repository: The Extended Metadata Registry Project John L. McCarthy XMDR Project Lawrence Berkeley

page 13 of 15XMDR Open Ontology Talk-v5.ppt

Technical Challenges and Issues for XMDR Implementation Testbed• Complexity

– representation of different types of relationships– non-binary relationships -- e.g., instrumentality (A used to do B to C)– extensibility for unknown future complexities (e.g., Omega)?– incorporate IKL variant of CLIF dialect of ISO Common Logic?

• Scalability & performance– currently includes tens of thousands of objects & millions of RDF triples– maybe indexing and/or distributed registries will help?

• External metadata sources, ontologies, terminologies– cannot simply be copied because they are proprietary & evolving

• Mapping (to data elements as well as between e.g. between concept systems)– wide variety of challeges (e.g., probabilistic & changing mappings)

• Manage evolving metamodel, concept systems & mappings– additions & changes in both content & structure over time, versioning

• Harmonize with ODM, MMF, CL, OMV, Web Services– need open source, standards-based approach (vs. proprietary)

Page 14: A Standard & Prototype Starting Point for An Open Ontology Repository: The Extended Metadata Registry Project John L. McCarthy XMDR Project Lawrence Berkeley

page 14 of 15XMDR Open Ontology Talk-v5.ppt

Conclusion: Why should OOR & XMDR projects consider closer collaboration?

• Potential benefits for OOR Project…– modular, extensible, open source code base– initial set of ontologies & other concept systems– major collaborators (EPA, NCI, DOD, EEA, …)– real-world ontology applications– ISO/IEC standards-based approach– proven 11179 administrative metadata & procedures for

managing stewardship & evolution of individual items – extensive & extensible OOR metamodel

• Potential benefits for the XMDR Project– ontology experts, experience and ideas (e.g., Natasha re OMV)– more ontologies to exercise expressivity & tools– help in refining ontology representation & mapping

Page 15: A Standard & Prototype Starting Point for An Open Ontology Repository: The Extended Metadata Registry Project John L. McCarthy XMDR Project Lawrence Berkeley

page 15 of 15XMDR Open Ontology Talk-v5.ppt

Thanks & Acknowledgements

• Bruce Bargmeyer, principal investigator• Frank Olken, initial concepts & metamodel extensions • Kevin Keck, initial & current designer & implementor• Karlo Berkett, implementation, user interface, data loading• Harold Solbrig, Lexgrid, model development, etc!• Fred Gey, concept mapping, etc.

• L8 and SC 32/WG 2 Standards Committees

• Major XMDR Project Sponsors and Collaborators– National Science Foundation (Grant #0637122)– U.S. Environmental Protection Agency– Department of Defense– National Cancer Institute– U.S. Geological Survey– And others!