49
1 eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics Washington DC May 23, 2005 Bruce Bargmeyer, Lawrence Berkley National Laboratory University of California Tel: +1 510-495-2905 [email protected]

1 eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics Washington DC May 23, 2005 Bruce Bargmeyer, Lawrence Berkley

Embed Size (px)

Citation preview

Page 1: 1 eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics Washington DC May 23, 2005 Bruce Bargmeyer, Lawrence Berkley

1

eXtended Metadata Registry (XMDR)

Interagency/International Cooperation on EcoinformaticsWashington DC

May 23, 2005

Bruce Bargmeyer, Lawrence Berkley National LaboratoryUniversity of CaliforniaTel: +1 [email protected]

Page 2: 1 eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics Washington DC May 23, 2005 Bruce Bargmeyer, Lawrence Berkley

2

XMDR Project Background

Collaborative, interagency effort EPA, USGS, NCI, Mayo Clinic, DOD, LBNL …& others

Draws on and contributes to interagency/International Cooperation on Ecoinformatics

Involves Ecoterm, international, national, state, local government agencies, other organizations

Recognizes great potential of semantic computing, management of metadata Improving collection, maintenance, dissemination, processing

of very diverse data structures Collaboration arises from needs for traditional data administration,

for sharing data across multiple organizations, for managing complex semantics associated with data, and for emerging semantics computing capbilities

Many Players, Many Interests…Shared Context

Page 3: 1 eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics Washington DC May 23, 2005 Bruce Bargmeyer, Lawrence Berkley

3

11179 Metadata RegistriesExtensions

Register (and manage) any semantics that are useful for managing data. E.g., this may include registering not only permissible

values (concepts), definitions, but may extend to registration of the full concept systems in which the permissible values are found.

E.g., may want to register keywords, thesauri, taxonomies, ontologies, axiomitized ontologies….

Support traditional data management and data administration

Lay Foundation for semantic computing: Semantics Service Oriented Architecture, Semantic Grids, Semantics based workflows, Semantic Web ….

Page 4: 1 eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics Washington DC May 23, 2005 Bruce Bargmeyer, Lawrence Berkley

4

XMDR Draws Together

Metadata Registry

Terminology Thesaurus Themes

DataStandards

Ontology GEMET

StructuredMetadata

UsersUsers

MetadataRegistries

Terminology

CONCEPT

Referent

Refers To Symbolizes

Stands For

“Rose”,“ClipArt”

Page 5: 1 eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics Washington DC May 23, 2005 Bruce Bargmeyer, Lawrence Berkley

5

What is Metadata?What is Terminology (a concept system)?

Metadata Registry

Terminology Thesaurus Themes

DataStandards

Ontology GEMET

StructuredMetadata

UsersUsers

MetadataRegistries

Terminology

CONCEPT

Referent

Refers To Symbolizes

Stands For

“Rose”,“ClipArt”

Page 6: 1 eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics Washington DC May 23, 2005 Bruce Bargmeyer, Lawrence Berkley

6

Metadata RegistriesAfghanistan

Belgium

China

Denmark

Egypt

France

Germany

…………

Data ElementsData ElementsAFG

BEL

CHN

DNK

EGY

FRA

DEU

…………

ISO 3166English Name

ISO 31663-Numeric Code

004

056

156

208

818

250

276

…………

ISO 31663-Alpha Code

Afghanistan

Belgium

China

Denmark

Egypt

France

Germany

…………

Name:Context:Definition:Unique ID: 4572Value Domain:Maintenance Org.:Steward:Classification:Registration Authority:Others

Name:Context:Definition:Unique ID: 3820Value Domain:Maintenance Org.:Steward:Classification:Registration Authority:Others

Name:Context: Definition:Unique ID: 1047Value Domain:Maintenance Org.:Steward:Classification:Registration Authority:Others

Name: Country IdentifiersContext:Definition:Unique ID: 5769Conceptual Domain:Maintenance Org.:Steward:Classification:Registration Authority:Others

Data Element Data Element ConceptConcept

Page 7: 1 eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics Washington DC May 23, 2005 Bruce Bargmeyer, Lawrence Berkley

7

Data Metadata

Fuji Variety name

4129Product look-up

(PLU) code

Productof

Canada

Country of

originPLU codes consist of 4 to 5 numbers• 4 numbers = conventional produce• 5 numbers, starting with 9 = organic produce• 5 numbers, staring with 8 = genetically engineered

produce PLU codes are established by the International Federation for Produce Coding,A coalition of fruit and vegetable associations coordinated by the Produce Marketing Association.

What is Metadata/Terminology?

Page 8: 1 eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics Washington DC May 23, 2005 Bruce Bargmeyer, Lawrence Berkley

8

Data Metadata

Fuji Variety name

4129Product look-up

(PLU) code

Productof

Canada

Country of

origin

What is Metadata/Terminology?

New PLUs are assigned by the PEIB.  More information about the PEIB may be found at their website:Produce Electronic Information Board.

Page 9: 1 eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics Washington DC May 23, 2005 Bruce Bargmeyer, Lawrence Berkley

9

Data MetadataFuji Variety name

4129Product look-up

(PLU) code

Productof

Canada

Country of

origin

What is Metadata/Terminology?

4128Apple, Cripps Pink, Small, 100 size and smaller, less than 205g, [Notes: As of 1 Jun 2001, Pink Lady® is a registered trademark of certain Cripps Pink apples, Last revised: 1 Jun 2002]

4129 Apple, Fuji, Small, 100 size and smaller, less than 205g

4130Apple, Cripps Pink, Large, 88 size and larger, 205g and above, [Notes: As of 1 Jun 2001, Pink Lady® is a registered trademark of certain Cripps Pink apples, Last revised: 1 Jun 2002]

4131 Apple, Fuji, Large, 88 size and larger, 205g and above

Page 10: 1 eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics Washington DC May 23, 2005 Bruce Bargmeyer, Lawrence Berkley

10

What is Metadata/Terminology?

Fruit: the developed ovary of a seed plant with its contents and accessory parts, as the pea pod, nut, tomato, or pineapple.

Fruit

Apple Orange

Data MetadataFuji Variety name

4129Product look-up

(PLU) code

Productof

Canada

Country of

origin

Page 11: 1 eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics Washington DC May 23, 2005 Bruce Bargmeyer, Lawrence Berkley

11

What is Metadata/terminology?

fruit

Fruit (frÁt), n., pl. fruits, (esp. collectively) fruit, v. –n.1. any product of plant growth useful to humans or animals.2. the developed ovary of a seed plant with its contents and accessory parts, as the pea pod, nut, tomato, or pineapple.3. the edible part of a plant developed from a flower, with any accessory tissues, as the peach, mulberry, or banana.4. the spores and accessory organs of ferns, mosses, fungi, algae, or lichen.5. anything produced or accruing; product, result, or effect; return or profit: the fruits of one's labors.6. Slang (disparaging and offensive). a male homosexual.7. –v.i., v.t. to bear or cause to bear fruit: a tree that fruits in late summer; careful pruning that sometimes fruits a tree.

Data MetadataFuji Variety name

4129Product look-up

(PLU) code

Productof

Canada

Country of

origin

Page 12: 1 eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics Washington DC May 23, 2005 Bruce Bargmeyer, Lawrence Berkley

12

What is Metadata/Terminology?

fruit

Fruit

Apple Orange

Fly

HorseFly

FruitFlies

Fruit flies lay eggs in fruit

Data MetadataFuji Variety name

4129Product look-up

(PLU) code

Productof

Canada

Country of

origin

Page 13: 1 eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics Washington DC May 23, 2005 Bruce Bargmeyer, Lawrence Berkley

13

C.K. Ogden/I.A. Richards, The Meaning of MeaningA Study in the Influence of Language upon Thought and The Science of SymbolismLondon 1923, 10th edition 1969

CONCEPT

Referent

Refers To Symbolizes

Stands For

“Rose”,“ClipArt”

What is Terminology?

Page 14: 1 eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics Washington DC May 23, 2005 Bruce Bargmeyer, Lawrence Berkley

14

ReferentReferent

ConceptConcept

TermTermtroutSalmo truttabrown trouttruite

Definition: Any of several game fishes of the genus Salmo, related to the salmon...

Registering TerminologyRegistering Terminology

Refers To Symbolizes

Stands For

Page 15: 1 eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics Washington DC May 23, 2005 Bruce Bargmeyer, Lawrence Berkley

15

Registering Terminology

Terms ContextConcept

troutSalmo truttatruite

common namescientific nameFrench name

any of several game fishes of the genus Salmo, related to the salmon...

UID=6349

Page 16: 1 eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics Washington DC May 23, 2005 Bruce Bargmeyer, Lawrence Berkley

16

Terms ContextConcept

Brown troutSalmo truttatruite

common namescientific nameFrench name

UIN=6349

DataElements

NameName: trout species

Definition: The names of species of trout.Values:brook trout Salvelinus fontinalisbrown trout Salmo truttacutthroat trout Oncorhynchus clarkii

Concepts into Data

Page 17: 1 eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics Washington DC May 23, 2005 Bruce Bargmeyer, Lawrence Berkley

17

Systems:STORETEnvirofacts . . .

DBMSQuery

Terms ContextConcept

Brown troutSalmo truttatruite

common namescientific nameFrench name

UIN=6349

XML SchemasEDI Messages

DataInterchange

W3C RDFVocabularies

Ontology

DataElements

Page 18: 1 eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics Washington DC May 23, 2005 Bruce Bargmeyer, Lawrence Berkley

18

Continuing ChallengeSynonyms, Homonyms, Provenance

Synonyms: so many ways to name, identify, and state the same thing (one concept--many terms)

Homonyms: different meanings for the same terms and identifiers (one term—many concepts)

Provenance: How to record the who, where, when, why, and how that is relevant to data

Page 19: 1 eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics Washington DC May 23, 2005 Bruce Bargmeyer, Lawrence Berkley

19

Two Points of View

I wanna be free: Programs, system developers, scientists, … that want to get

something done quickly, without the drag of documentation and uniformity.

Let me do it quickly, my way, and let others accept it. Coherence within some large Universe of Discourse

Data users who want to get a coherent view across the boundaries of individual programs, organizations, scientific studies. E.g., media specific programs.

Harmonize and standardize data and terminology. Document data/terminology in structured ways. Then easier to find, access, analyze, understand and use data.

Market driven approaches to data management may provide a means to draw these closer together. E.g., anyone can register anything, and a community of interest gives it some declared level of acceptance.

Page 20: 1 eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics Washington DC May 23, 2005 Bruce Bargmeyer, Lawrence Berkley

20

Data Management Evolution

Trying to manage semantics: What does data mean? Can data be compared? What is the provenance of data? [Freedom vs. Coherence]

3rd Generation languages – naming conventions, system documentation

Data Base Management Systems – Data dictionary for schema, valid values, etc. Metadata Registries for data sharing organization-wide or across environmental domain of discourse

XML – Metadata Registries and XML registries for managing XML tags, data, and XML artifacts.

Semantic computing – Metadata Registries for managing the “vocabulary” and concept systesm, e.g., ontologies.

Page 21: 1 eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics Washington DC May 23, 2005 Bruce Bargmeyer, Lawrence Berkley

21

Movement TowardSemantics Management

Going beyond traditional Data Standards and Data Administration

In addition to anchoring data with definitions, we want to process data and concepts based on context and relationships, possibly using inferences and rules.

In addition to natural language, we want to capture semantics with more formal description techniques FOL, DL, Common Logic, OWL

Going beyond information system interoperability and data interchange to processing based on inferences and probabilistic correspondence between concepts found in natural

language (in the wild) and both data in databases and concepts found in concept systems.

Page 22: 1 eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics Washington DC May 23, 2005 Bruce Bargmeyer, Lawrence Berkley

22

Purposes of XMDR Project

1. Propose revisions to 11179 Parts 2 & 3 (3rd Ed.) – to serve as the design for the next generation of metadata registries.

2. Demonstrate Reference Implementation – to validate the proposed revisions

Extend semantics management capabilities Enable registration of correspondences between multiple

concept systems and between concept systems and data Explore uses of terminologies and ontologies Systematize representation of concepts and relationships Enable registration of metadata for knowledge bases Adapt & test emerging semantic technologies Provide an environment for developing and interrelating

ontologies

Page 23: 1 eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics Washington DC May 23, 2005 Bruce Bargmeyer, Lawrence Berkley

23

What is an ontology?

The subject of ontology is the study of the categories of things that exist or may exist in some domain. The product of such a study, called an ontology, is a catalog of the types of things that are assumed to exist in a domain of interest D from the perspective of a person who uses a language L for the purpose of talking about D. The types in the ontology represent the predicates, word senses, or concept and relation types of the language L when used to discuss topics in the domain D.

Building, Sharing, and Merging Ontologies-John F. Sowa

Page 24: 1 eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics Washington DC May 23, 2005 Bruce Bargmeyer, Lawrence Berkley

24

Terminolocgical & Formal(Axiomatized) Ontologies

The difference between a terminological ontology and a formal ontology is one of degree: as more axioms are added to a terminological ontology, it may evolve into a formal or axiomatized ontology.

Cyc has the most detailed axioms and definitions; it is an example of an axiomatized or formal ontology. EDR and WordNet are usually considered terminological ontologies.

Building, Sharing, and Merging OntologiesJohn F. Sowa

Page 25: 1 eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics Washington DC May 23, 2005 Bruce Bargmeyer, Lawrence Berkley

25

An Axiom for an Axiomatized Ontology

Definition: The resource_cost_point predicate, cpr, specifies the cost_value, c, (monetary units) of a resource, r, required by an activity, a, upto a certain time point, t.

If a resource of the terminal use or consume states, s, for an activity, a, are enabled at time point, t, there must exist a cost_value, c, at time point, t, for the activity, a,that uses or consumes the resource, r. The time interval, ti = [ts, te], during which a resource is used or consumed byan activity is specified in the use or consume specifications as use_spec(r, a, ts, te, q) or consume_spec(r, a, ts, te, q) where activity, a, uses or consumes quantity, q, of resource, r, during the time interval [ts, te]. Hence,

Axiom: a, s, r, q, ts, te, (use_spec(r, a, ts, te, q) enabled(s, a, t)) ∀ ∧ ∨(consume_spec(r, a, ts, te, q) enabled(s, a, t))≡ c, cpr(a,c,t,r)∧ ∃

Cost Ontology for TOronto Virtual Enterprise (TOVE)

Page 26: 1 eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics Washington DC May 23, 2005 Bruce Bargmeyer, Lawrence Berkley

26

Samples of Eco & Bio Graph Data

Nutrient cycles in microbial ecologies These are bipartite graphs, with two sets of nodes, microbes and reactants (nutrients), and directed edges indicating input and output relationships. Such nutrient cycle graphs are used to model the flow of nutrients in microbial ecologies, e.g., subsurface microbial ecologies for bioremediation.

Chemical structure graphs: Here atoms are nodes, and chemical bonds are represented by undirected edges. Multi-electron bonds are often represented by multiple edges between nodes (atoms), hence these are multigraphs. Common queries include subgraph isomorphism. Chemical structure graphs are commonly used in chemoinformatics systems, such as Chem Abstracts, MDL Systems, etc.

Sequence data and multiple sequence alignments . DNA/RNA/Protein sequences can be modeled as linear graphs

Topological adjacency relationships also arise in anatomy. These relationships differ from partonomies in that adjacency relationships are undirected and not generally transitive.

Page 27: 1 eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics Washington DC May 23, 2005 Bruce Bargmeyer, Lawrence Berkley

27

Eco & Bio Graph Data (Continued)

Taxonomies of proteins, chemical compounds, and organisms, ... These taxonomies (classification systems) are usually represented as directed acyclic graphs (partial orders or lattices). They are used when querying the pathways databases. Common queries are subsumption testing between two terms/concepts, i.e., is one concept a subset or instance of another. Note that some phylogenetic tree computations generate unrooted, i.e., undirected. trees.

Metabolic pathways: chemical reactions used for energy production, synthesis of proteins, carbohydrates, etc. Note that these graphs are usually cyclic.

Signaling pathways: chemical reactions for information transmission and processing. Often these reactions involve small numbers of molecules. Graph structure is similar to metabolic pathways.

Partonomies are used in biological settings most often to represent common topological relationships of gross anatomy in multi-cellular organisms. They are also useful in sub-cellular anatomy, and possibly in describing protein complexes. They are comprised of part-of relationships (in contrast to is-a relationships of taxonomies). Part-of relationships are represented by directed edges and are transitive. Partonomies are directed acyclic graphs.

Data Provenance relationships are used to record the source and derivation of data. Here, some nodes are used to represent either individual "facts" or "datasets" and other nodes represent "data sources" (either labs or individuals). Edges between "datasets" and "data sources" indicate "contributed by". Other edges (between datasets (or facts)) indicate derived from (e.g., via inference or computation). Data provenance graphs are usually directed acyclic graphs.

Page 28: 1 eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics Washington DC May 23, 2005 Bruce Bargmeyer, Lawrence Berkley

28

A graph theoretic characterization

Readily comprehensible characterization of metadata structures

Graph structure has implications for: Integrity Constraint Enforcement Data structures Query languages Combining metadata sets Algorithms for query processing

Page 29: 1 eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics Washington DC May 23, 2005 Bruce Bargmeyer, Lawrence Berkley

30

Example of a graph

infectious disease

influenza measles

is-ais-a

Page 30: 1 eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics Washington DC May 23, 2005 Bruce Bargmeyer, Lawrence Berkley

31

Types of Metadata Graph Structures

Trees Partially Ordered Trees Ordered Trees Faceted Classifications Directed Acyclic Graphs Partially Ordered Graphs Lattices Bipartite Graphs Directed Graphs Cliques Compound Graphs

Page 31: 1 eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics Washington DC May 23, 2005 Bruce Bargmeyer, Lawrence Berkley

32

Graph Taxonomy

Directed Graph

Directed Acyclic Graph

Graph

Undirected Graph

Bipartite Graph

Partial Order Graph

Faceted Classification

Clique

Partial Order Tree

Tree

Lattice

Ordered Tree

Note: not all bipartite graphsare undirected.

Page 32: 1 eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics Washington DC May 23, 2005 Bruce Bargmeyer, Lawrence Berkley

33

Trees

In metadata settings trees are almost most often directed edges indicate direction

In metadata settings trees are usually partial orders Transtivity is implied (see next slide) Not true for some trees with mixed edge types. Not always true for all partonomies

Page 33: 1 eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics Washington DC May 23, 2005 Bruce Bargmeyer, Lawrence Berkley

34

Example: Tree

Oakland Berkeley

Alameda County

California

part-of

part-of part-of

part-of

part-of

Santa Clara San Jose

Santa Clara County

part-of

Page 34: 1 eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics Washington DC May 23, 2005 Bruce Bargmeyer, Lawrence Berkley

40

Faceted Classification

Classification scheme has mulitple facets Each facet = partial order tree Categories = conjunction of facet values (often

written as [facet1, facet2, facet3]) Faceted classification = a simplified partial order

graph Introduced by Ranganathan in 19th century, as

Colon Classification scheme Faceted classification can be descirbed with

Description Logc, e.g., OWL-DL

Page 35: 1 eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics Washington DC May 23, 2005 Bruce Bargmeyer, Lawrence Berkley

41

Example: Faceted Classification

Wheeled Vehicle Facet

2 wheeled

4 wheeled

3 wheeled

is-a is-a is-a

Internal Combustion

Human Powered

Vehicle Propulsion Facet

Bicycle Tricycle MotorcycleAuto

is-a is-a

is-a

is-ais-a

is-ais-a

is-a

is-a

is-a is-a

Page 36: 1 eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics Washington DC May 23, 2005 Bruce Bargmeyer, Lawrence Berkley

43

Directed Acylic Graphs

Graph: Directed edges No cycles No assumptions about transitivity (e.g., mixed

edge types, some partonomies) Nodes may have multiple parents

Examples: Partonomies (“part-of”) - transitivity is not

always true

Page 37: 1 eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics Washington DC May 23, 2005 Bruce Bargmeyer, Lawrence Berkley

44

Example: Directed Acyclic Graph

Wheeled Vehicle

2 Wheeled Vehicle

4 WheeledVehicle

3 WheeledVehicle

is-a is-a is-a

Internal Combustion

Vehicle

Human PoweredVehicle

PropelledVehicle

Bicycle Tricycle MotorcycleAuto

is-a is-a

is-a

is-ais-a

is-ais-a

is-a

is-a

is-a is-a

is-a is-a

Vehicle

Page 38: 1 eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics Washington DC May 23, 2005 Bruce Bargmeyer, Lawrence Berkley

48

Lattices

A partial order For every pair of elements A and B

There exists a least upper bound There exists a greatest lower bound

Example: The power set (all possible subsets) of a finite

set LUB(A,B) = union of two sets A, B GLB(A,B) = intersect of two sets A,B

Page 39: 1 eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics Washington DC May 23, 2005 Bruce Bargmeyer, Lawrence Berkley

49

Example Lattice: Powerset of 3 element set

{a,b,c}

{c}{a} {b}

{b,c}{a,b} {a,c}

Denotes subset

Empty Set

Page 40: 1 eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics Washington DC May 23, 2005 Bruce Bargmeyer, Lawrence Berkley

51

Bipartite Graphs

Vertices = two disjoint sets, V and W All edges connect one vertex from V and

one vertex from W Examples:

mappings among value representations mappings among schemas (entity/attribute, relationship) nodes in

Conceptual Graphs

Page 41: 1 eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics Washington DC May 23, 2005 Bruce Bargmeyer, Lawrence Berkley

52

Example Bipartite Graph

California

Massachusetts

Oregon

CA

MA

OR

States Two-letter state codes

Page 42: 1 eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics Washington DC May 23, 2005 Bruce Bargmeyer, Lawrence Berkley

58

Challenges

How to register & manage the various graph structures? DBMS, File systems ….

How to query the graph structures? XQuery for XML Poor to non-existent graph query languages

How to get adequate performance, even in high performance computing environment

User interface complexity How to manage semantic drift Versions How to interrelate graphs with other graphs and with data Granularity at which to register metadata (then point to

greater detail elsewhere?)

Page 43: 1 eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics Washington DC May 23, 2005 Bruce Bargmeyer, Lawrence Berkley

63

Architecture Approach

Fully modular approach Exemplars:

Apache Web Server Eclipse IDE Protégé Ontology Editor

Benefits: numerous modules are relatively easy to implement clean separation of concerns and high reusability and

portability tooling support required is minimal

Page 44: 1 eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics Washington DC May 23, 2005 Bruce Bargmeyer, Lawrence Berkley

64

Ontology EditorProtege11179 OWL Ontology

XMDR Prototype Architecture: Initial Implemented Modules

MetadataValidator

AuthenticationService

MappingEngine

RegistryExternalInterface

Generalization Composition (tight ownership) Aggregation (loose ownership)

Jena, Xerces

Java

RetrievalIndex

FullTextIndex

Lucene

LogicBasedIndex

Jena, OWI KSRacer

RegistryStore

WritableRegistryStore

Subversion

Page 45: 1 eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics Washington DC May 23, 2005 Bruce Bargmeyer, Lawrence Berkley

65

XMDR Content Priority ListPhase 1(V.A) National Drug File Reference Terminology (?)

DTIC Thesaurus (Defense Technology Info. Center Thesaurus)

NCI Thesaurus National Cancer Institute Thesaurus

NCI Data Elements (National Cancer Institute Data Standards Registry

UMLS (non-proprietary portions)

GEMET (General Multilingual Environmental Thesaurus)

EDR Data Elements (Environmental Data Registry)

ISO 3166 Country Codes – from EPA EDR

USGS Geographic Names Information System (GNIS)

Page 46: 1 eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics Washington DC May 23, 2005 Bruce Bargmeyer, Lawrence Berkley

66

XMDR Content Priority ListPhase 2LOINC Logical Observation Identifiers Names and Codes

ITIS Integrated Taxonomic Information System

Getty Thesaurus of Geographic Names (TGN)

SIC (Standard Industrial Classification System)

NAICS (North American Industrial Classification System)

NAIC-SIC mappings

UNSPSC (United Nations Standard Products and Services Codes)

EPA Chemical Substance Registry System

EPA Terminology Reference System

ISO Language Identifiers ISO 639-3 Part 3

IETF Language Identifiers RFC 1766

Units Ontology

Page 47: 1 eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics Washington DC May 23, 2005 Bruce Bargmeyer, Lawrence Berkley

67

XMDR Content Priority ListPhase 3HL7 Terminology

HL7 Data Elements

GO (Gene Ontology)

NBII Biocomplexity Thesaurus

EPA Web Registry Controlled Vocabulary

BioPAX Ontology

NASA SWEET Ontologies

NDRTF

Page 48: 1 eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics Washington DC May 23, 2005 Bruce Bargmeyer, Lawrence Berkley

68

Coming Year (Proposed)

Extension of XMDR core – data & system Semantic Services Greater interaction with Ecoterm

organizations Interaction with Ecoinformatics Test Bed

project

Page 49: 1 eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics Washington DC May 23, 2005 Bruce Bargmeyer, Lawrence Berkley

69

Acknowledgementsand References

Frank Olken, LBNLKevin Keck, LBNLJohn McCarthy, LBNL