34
2004-11-04 1 F. Olken, SC32WG2 Graph Theoretic Characterization of Metadata Structures Frank Olken, Bruce Bargmeyer, Kevin D. Keck Lawrence Berkeley National Laboratory Computational Research Division presented to ISO JTC1 SC32 WG2 Mtg. Washington, DC Nov. 8, 2004

2004-11-04 1 F. Olken, SC32WG2 Graph Theoretic Characterization of Metadata Structures Frank Olken, Bruce Bargmeyer, Kevin D. Keck Lawrence Berkeley National

Embed Size (px)

Citation preview

Page 1: 2004-11-04 1 F. Olken, SC32WG2 Graph Theoretic Characterization of Metadata Structures Frank Olken, Bruce Bargmeyer, Kevin D. Keck Lawrence Berkeley National

2004-11-04 1 F. Olken, SC32WG2

Graph Theoretic Characterization of

Metadata Structures

Frank Olken, Bruce Bargmeyer,

Kevin D. Keck

Lawrence Berkeley National LaboratoryComputational Research Division

presented to ISO JTC1 SC32 WG2 Mtg.Washington, DC

Nov. 8, 2004

Page 2: 2004-11-04 1 F. Olken, SC32WG2 Graph Theoretic Characterization of Metadata Structures Frank Olken, Bruce Bargmeyer, Kevin D. Keck Lawrence Berkeley National

2004-11-04 2 F. Olken, SC32WG2

Our Goal

● Create a taxonomy of metadata structures:– Terminologies, Taxonomies, partonomies,

ontologies, ...● Taxonomy will be framed in graph

theoretic terms:– What kind of graph would be used to

represent a given metadata structure

Page 3: 2004-11-04 1 F. Olken, SC32WG2 Graph Theoretic Characterization of Metadata Structures Frank Olken, Bruce Bargmeyer, Kevin D. Keck Lawrence Berkeley National

2004-11-04 3 F. Olken, SC32WG2

Why use a graph theoretic characterization?

● Readily comprehensible characterization of metadata structures

● Graph structure has implications for:– Integrity Constraint Enforcement– Data structures– Query languages– Combining metadata sets– Algorithms for query processing

Page 4: 2004-11-04 1 F. Olken, SC32WG2 Graph Theoretic Characterization of Metadata Structures Frank Olken, Bruce Bargmeyer, Kevin D. Keck Lawrence Berkeley National

2004-11-04 4 F. Olken, SC32WG2

Definition of a graph

● Graph = vertex (node) set + edge set ● Nodes, edges may be labeled● Edge set = binary relation over nodes

– cf. NIAM● Labeled edge set

– RDF triples (subject, predicate, object) – predicate = edge label

● Typically edges are directed

Page 5: 2004-11-04 1 F. Olken, SC32WG2 Graph Theoretic Characterization of Metadata Structures Frank Olken, Bruce Bargmeyer, Kevin D. Keck Lawrence Berkeley National

2004-11-04 5 F. Olken, SC32WG2

Example of a graph

infectious disease

influenza measles

is-ais-a

Page 6: 2004-11-04 1 F. Olken, SC32WG2 Graph Theoretic Characterization of Metadata Structures Frank Olken, Bruce Bargmeyer, Kevin D. Keck Lawrence Berkeley National

2004-11-04 6 F. Olken, SC32WG2

Types of Metadata Graph Structures

● Trees

● Partially Ordered Trees

● Ordered Trees

● Faceted Classifications

● Directed Acyclic Graphs

● Partially Ordered Graphs

● Lattices

● Bipartite Graphs

● Directed Graphs

● Cliques

● Compound Graphs

Page 7: 2004-11-04 1 F. Olken, SC32WG2 Graph Theoretic Characterization of Metadata Structures Frank Olken, Bruce Bargmeyer, Kevin D. Keck Lawrence Berkeley National

2004-11-04 7 F. Olken, SC32WG2

Graph Taxonomy

Directed Graph

Directed Acyclic Graph

Graph

Undirected Graph

Bipartite Graph

Partial Order Graph

Faceted Classification

Clique

Partial Order Tree

Tree

Lattice

Ordered Tree

Note: not all bipartite graphsare undirected.

Page 8: 2004-11-04 1 F. Olken, SC32WG2 Graph Theoretic Characterization of Metadata Structures Frank Olken, Bruce Bargmeyer, Kevin D. Keck Lawrence Berkeley National

2004-11-04 8 F. Olken, SC32WG2

Trees

● In metadata settings trees are almost invariably directed – edges indicate direction

● In metadata settings trees are usually partial orders– Transtivity is implied (see next slide)– Not true for some trees with mixed edge

types.– Not always true for all partonomies

Page 9: 2004-11-04 1 F. Olken, SC32WG2 Graph Theoretic Characterization of Metadata Structures Frank Olken, Bruce Bargmeyer, Kevin D. Keck Lawrence Berkeley National

2004-11-04 9 F. Olken, SC32WG2

Example: Tree

Oakland Berkeley

Alameda County

California

part-of

part-of part-of

part-of

part-of

Santa Clara San Jose

Santa Clara County

part-of

Page 10: 2004-11-04 1 F. Olken, SC32WG2 Graph Theoretic Characterization of Metadata Structures Frank Olken, Bruce Bargmeyer, Kevin D. Keck Lawrence Berkeley National

2004-11-04 10 F. Olken, SC32WG2

Trees - cont.

● Uniform vs. non-uniform height subtrees● Uniform height subtrees

– fixed number of levels – common in dimensions of multi-dimensional

data models● Non-uniform height subtrees

– common terminologies

Page 11: 2004-11-04 1 F. Olken, SC32WG2 Graph Theoretic Characterization of Metadata Structures Frank Olken, Bruce Bargmeyer, Kevin D. Keck Lawrence Berkeley National

2004-11-04 11 F. Olken, SC32WG2

Partially Ordered Trees

● A conventional directed tree● Plus, assumption of transitivity● Usually only show immediate ancestors

(transitive reduction)● Edges of transitive closure are implied● Classic Example:

– Simple Taxonomy, “is-a” relationship

Page 12: 2004-11-04 1 F. Olken, SC32WG2 Graph Theoretic Characterization of Metadata Structures Frank Olken, Bruce Bargmeyer, Kevin D. Keck Lawrence Berkeley National

2004-11-04 12 F. Olken, SC32WG2

Example: Partial Order Tree

polio smallpox

Infectious Disease

Disease

is-a

is-a is-a

is-a

is-a

diabetes heart disease

chronic disease

is-a

Signifies inferred is-a relationship

Page 13: 2004-11-04 1 F. Olken, SC32WG2 Graph Theoretic Characterization of Metadata Structures Frank Olken, Bruce Bargmeyer, Kevin D. Keck Lawrence Berkeley National

2004-11-04 13 F. Olken, SC32WG2

Ordered Trees

● Order here refers to order among sibling nodes (not related to partial order discussed elsewhere)

● XML documents are ordered trees– Ordering of “sub-elements” is to support

classic linear encoding of documents

Page 14: 2004-11-04 1 F. Olken, SC32WG2 Graph Theoretic Characterization of Metadata Structures Frank Olken, Bruce Bargmeyer, Kevin D. Keck Lawrence Berkeley National

2004-11-04 14 F. Olken, SC32WG2

Example: Ordered Tree

part-ofpart-of

part-of

Paper

Title page Section Bibliography

Note: implicit ordering relation among parts of paper.

Page 15: 2004-11-04 1 F. Olken, SC32WG2 Graph Theoretic Characterization of Metadata Structures Frank Olken, Bruce Bargmeyer, Kevin D. Keck Lawrence Berkeley National

2004-11-04 15 F. Olken, SC32WG2

Faceted Classification

● Classification scheme has mulitple facets● Each facet = partial order tree● Categories = conjunction of facet values

(often written as [facet1, facet2, facet3])● Faceted classification = a simplified partial

order graph● Introduced by Ranganathan in 19th century,

as Colon Classification scheme● Faceted classification can be descirbed with

Description Logc, e.g., OWL-DL

Page 16: 2004-11-04 1 F. Olken, SC32WG2 Graph Theoretic Characterization of Metadata Structures Frank Olken, Bruce Bargmeyer, Kevin D. Keck Lawrence Berkeley National

2004-11-04 16 F. Olken, SC32WG2

Example: Faceted Classification

Wheeled Vehicle Facet

2 wheeled4 wheeled

3 wheeled

is-ais-a

is-a

Internal Combustion

Human Powered

Vehicle Propulsion Facet

Bicycle Tricycle MotorcycleAuto

is-ais-a

is-a is-ais-

is-a is-ais-a

is-a

is-a

is-a

is-a

Page 17: 2004-11-04 1 F. Olken, SC32WG2 Graph Theoretic Characterization of Metadata Structures Frank Olken, Bruce Bargmeyer, Kevin D. Keck Lawrence Berkeley National

2004-11-04 17 F. Olken, SC32WG2

Faceted Classifications and Multi-dimensional Data Model● MDM – a.k.a. OLAP data model

– Online Analytical Processing data model– Star / Snowflake schemas

● Fact Tables

– fact = function over Cartesian product of dimensions

– dimensions = facets● geographic region, product category, year, ...

Page 18: 2004-11-04 1 F. Olken, SC32WG2 Graph Theoretic Characterization of Metadata Structures Frank Olken, Bruce Bargmeyer, Kevin D. Keck Lawrence Berkeley National

2004-11-04 18 F. Olken, SC32WG2

Directed Acylic Graphs

● Graph:– Directed edges– No cycles– No assumptions about transitivity (e.g.,

mixed edge types, some partonomies)– Nodes may have multiple parents

● Examples:– Partonomies (“part-of”) - transitivity is not

always true

Page 19: 2004-11-04 1 F. Olken, SC32WG2 Graph Theoretic Characterization of Metadata Structures Frank Olken, Bruce Bargmeyer, Kevin D. Keck Lawrence Berkeley National

2004-11-04 19 F. Olken, SC32WG2

Example: Directed Acyclic Graph

Propelled Vehicle

Wheeled Vehicle

2 wheeledvehicle

4 wheeledvehicle

3 wheeled vehicle

is-a

is-a

is-a Internal CombustionVehicle

Human Powered Vehicle

Bicycle Tricycle MotorcycleAuto

is-ais-a

is-a is-ais-

is-a is-ais-a

is-a

is-a

is-a

is-a

Vehicle

is-a

is-a

Observe: multiple inheritance.

Page 20: 2004-11-04 1 F. Olken, SC32WG2 Graph Theoretic Characterization of Metadata Structures Frank Olken, Bruce Bargmeyer, Kevin D. Keck Lawrence Berkeley National

2004-11-04 20 F. Olken, SC32WG2

Partial Order Graphs● Directed acyclic graphs + inferred transitivity

● Nodes may have multiple parents

● Most taxonomies drawn as transitive reduction, transitive closure edges are implied.

● Examples:

– all taxonomies– most partonomies– multiple inheritance

● POGs can be described in Description Logic, e.g., OWL-DL

Page 21: 2004-11-04 1 F. Olken, SC32WG2 Graph Theoretic Characterization of Metadata Structures Frank Olken, Bruce Bargmeyer, Kevin D. Keck Lawrence Berkeley National

2004-11-04 21 F. Olken, SC32WG2

Example: Partial Order Graph

Propelled Vehicle

Wheeled Vehicle

2 wheeledvehicle

4 wheeledvehicle

3 wheeled vehicle

is-a

is-a

is-a Internal CombustionVehicle

Human Powered Vehicle

Bicycle Tricycle MotorcycleAuto

is-ais-a

is-a is-ais-

is-a is-ais-a

is-a

is-a

is-a

is-a

Vehicle

is-a

is-a

Dashed line = inferred is-a (transitive closure)

Page 22: 2004-11-04 1 F. Olken, SC32WG2 Graph Theoretic Characterization of Metadata Structures Frank Olken, Bruce Bargmeyer, Kevin D. Keck Lawrence Berkeley National

2004-11-04 22 F. Olken, SC32WG2

Directed Graph

● Generalization of DAG (directed acyclic graph)

● Cycles are allowed● Arises when many edge types allowed● Example: UMLS

Page 23: 2004-11-04 1 F. Olken, SC32WG2 Graph Theoretic Characterization of Metadata Structures Frank Olken, Bruce Bargmeyer, Kevin D. Keck Lawrence Berkeley National

2004-11-04 23 F. Olken, SC32WG2

Lattices

● A partial order● For every pair of elements A and B

– There exists a least upper bound– There exists a greatest lower bound

● Example:– The power set (all possible subsets) of a

finite set – LUB(A,B) = union of two sets A, B– GLB(A,B) = intersect of two sets A,B

Page 24: 2004-11-04 1 F. Olken, SC32WG2 Graph Theoretic Characterization of Metadata Structures Frank Olken, Bruce Bargmeyer, Kevin D. Keck Lawrence Berkeley National

2004-11-04 24 F. Olken, SC32WG2

Example Lattice: Powerset of 3 element set

empty set = {}

{a,b,c}

{c}{a}{b}

{b,c}{a,b} {a,c}

Denotes subset

Page 25: 2004-11-04 1 F. Olken, SC32WG2 Graph Theoretic Characterization of Metadata Structures Frank Olken, Bruce Bargmeyer, Kevin D. Keck Lawrence Berkeley National

2004-11-04 25 F. Olken, SC32WG2

Lattices - Applications

● Formal Concept Analysis– synthesizing taxonomies

● Machine Learning– concept learning

Page 26: 2004-11-04 1 F. Olken, SC32WG2 Graph Theoretic Characterization of Metadata Structures Frank Olken, Bruce Bargmeyer, Kevin D. Keck Lawrence Berkeley National

2004-11-04 26 F. Olken, SC32WG2

Bipartite Graphs

● Vertices = two disjoint sets, V and W● All edges connect one vertex from V

and one vertex from W● Examples:

– mappings among value representations– mappings among schemas– (entity/attribute, relationship) nodes in

Conceptual Graphs

Page 27: 2004-11-04 1 F. Olken, SC32WG2 Graph Theoretic Characterization of Metadata Structures Frank Olken, Bruce Bargmeyer, Kevin D. Keck Lawrence Berkeley National

2004-11-04 27 F. Olken, SC32WG2

Example Bipartite Graph

California

Massachusetts

Oregon

CA

MA

OR

States Two-letter state codes

Page 28: 2004-11-04 1 F. Olken, SC32WG2 Graph Theoretic Characterization of Metadata Structures Frank Olken, Bruce Bargmeyer, Kevin D. Keck Lawrence Berkeley National

2004-11-04 28 F. Olken, SC32WG2

Clique

● Clique = complete graph (or subgraph)– all possible edges are present

● Used to represent equivalence classes● Typically, on undirected graphs

Page 29: 2004-11-04 1 F. Olken, SC32WG2 Graph Theoretic Characterization of Metadata Structures Frank Olken, Bruce Bargmeyer, Kevin D. Keck Lawrence Berkeley National

2004-11-04 29 F. Olken, SC32WG2

Example of Clique

California

CA

Calif.

CAL

Here edges denote synonymy.

Page 30: 2004-11-04 1 F. Olken, SC32WG2 Graph Theoretic Characterization of Metadata Structures Frank Olken, Bruce Bargmeyer, Kevin D. Keck Lawrence Berkeley National

2004-11-04 30 F. Olken, SC32WG2

Compound Graphs

● Edges can point to/from subgraphs, not just simple nodes

● Used in conceptual graphs● CG is isomorphic to First Order Logic● Could be used to specify contexts for

subgraphs

Page 31: 2004-11-04 1 F. Olken, SC32WG2 Graph Theoretic Characterization of Metadata Structures Frank Olken, Bruce Bargmeyer, Kevin D. Keck Lawrence Berkeley National

2004-11-04 31 F. Olken, SC32WG2

Example Compound Graph

Colin Powell claimed

Iraq had WMDs

Page 32: 2004-11-04 1 F. Olken, SC32WG2 Graph Theoretic Characterization of Metadata Structures Frank Olken, Bruce Bargmeyer, Kevin D. Keck Lawrence Berkeley National

2004-11-04 32 F. Olken, SC32WG2

Conclusions

● We can characterize metadata structure in terms of graph structures

● Partial Order Graphs are the most common structure:– used for taxonomies, partonomies– support multiple inheritance, faceted

classification– implicit inclusion of inferred transitive

closure edges

Page 33: 2004-11-04 1 F. Olken, SC32WG2 Graph Theoretic Characterization of Metadata Structures Frank Olken, Bruce Bargmeyer, Kevin D. Keck Lawrence Berkeley National

2004-11-04 33 F. Olken, SC32WG2

Additional Resources

● Faceted classification

– http://www.kmconnection.com/DOC100100.htm

● Lattices

– http://www.rci.rutgers.edu/~cfs/472_html/Learn/SimpleLattice_472.html

Page 34: 2004-11-04 1 F. Olken, SC32WG2 Graph Theoretic Characterization of Metadata Structures Frank Olken, Bruce Bargmeyer, Kevin D. Keck Lawrence Berkeley National

2004-11-04 34 F. Olken, SC32WG2

Contact Information

● Frank Olken– [email protected], 510-486-5891– http://www.lbl.gov/~olken

● XMDR Project– http://hpcrd.lbl.gov/SDM/XMDR/

● Bruce Bargmeyer– [email protected]

● Kevin D. Keck – mailto:[email protected]