50
Ontological Knowledge Engineering for Cultural Heritage of Andean Textiles Immanuel Normann Department of Computer Science and Information Systems July 20, 2012

Lkl talk-2012

Embed Size (px)

Citation preview

Ontological Knowledge Engineeringfor Cultural Heritage of Andean Textiles

Immanuel Normann

Department of Computer Science and Information Systems

July 20, 2012

Project Context

● Pre-Columbian Latin America had no writing system

● Alternative encoding systems were developed to pass down cultural knowledge

● Hypothesis: weaving patterns as “writing systems” in this sense

● General research endavour: deciphering these “writing systems”

● Our objective: systematization on knowledge about Andean weaving through ontological approach

● implementation of ontological knowledge system

● instantiation of the system with facts

Project Team

● La Paz

Instituto de Lengua y Cultura Aymara (Denise Y Arnold)

● Domain experts: knowledge acquisition and creation, building physical and virtual models, creating multimedia data.

● Software developer: web front end

● London

● AHRC (Luciana Martins):

principal investigator & domain experts (iconographic analysis)

● Birkbeck DCS (Sven Helmer):

Knowledge engineering + knowledge system implementation

My Role in this Project

Knowledgeengineering

My Role in this Project

Knowledgeengineering

SoftwareengineeringSoftwareengineering

My Role in this Project

Knowledgeengineering

SoftwareengineeringSoftwareengineering

Contentprocessing

My Role in this Project

Knowledgeengineering

SoftwareengineeringSoftwareengineering

Contentprocessing

11

3

2

Overview

Software Matters

Project status at the beginning of my work

● Project proposal intends ontological approach

● LaPaz team already aquainted with ontology related know how:

● Methontology

● Protege, CMap tools

● CIDOC-CRM

● Great amount of knowledge/data in spreadsheets

● Relational database schemes developed.

● other

● handwritten museum register documents

● images, videos, other multimedia documents,

● woven samples

Initial Steps

● identification of central research subdomains and their documents

textiles, instruments, processes, historical/cultural back grounds, iconography, ...

● identification of central docs: concept maps, spreadsheets

● identification of the requirements for the KMS:

● identification of stake holders

● development of use case scenarios

● competency questions

● setting up a communication platform & versioning system

Example Concept Map

Objeto textil

Materia prima

tiempo

actividades

Lugar

actor

grupo

periodo

instrumentos

fibra

tinte

mordiente

telar

T. horizontal

T. cintura

Rueca, etc.

evento

proceso

movimiento

urdido

tejido

prenda

bien

estructura

técnica

persona

tejedora

S. producción

S. recojo

P. Colonial

P. Contemporáneo, etc.

tiene

se elabora

con

se hizo en

se elabora con

es elaborado

por

se obtiene mediante

estilo

e. universal

e. local/tecnológico

tiene

Vida social

teñido

acabado

hilado

esquila

Objeto textil

apsu

es

es es

es

es

es

eses

es

es

es

pertenece a

es

es

sitio

ruta

es

tiene

S. custodia

imagen

Aprendizaje, etc.

Foto, video

Example Concept Map

Example Datasheet

Example Competency Questions

● ¿En qué sitios se halla evidencia de la práctica de la técnica x?

● What sites is evidence of the practice of the technique x?

● ¿En qué culturas se halla evidencia de la práctica de tal técnica?

● In what cultures is evidence of the practice of the technique x?

● ¿Cuál es el registro más antiguo de la técnica T?

● What is the oldest log of the technical T?

● ¿En qué tipo de prenda se empleó por primera vez la técnica X?

● What type of garment is employment for the first time the technique X?

● ¿Qué tipos de textiles se ha tejido usando la técnica T en un período P y región R?

● What types of textiles has been woven using the technique T in a period P and region R?

Early Results from Requirement Analysis

● How much of ontological reasoning is needed?

● Which system could provide it?

● Early tendency: RDBM.

● RDB schema already defined

● content partially already inserted in RDBM

● most content in spreadsheets

● ideas for simple reasoning developed

(transitivity, ontological queries translated to SQL)

● Does this approach satisfy the requirements?

Against the RDBM approach

● Knowledge in concept maps

● graph like knowledge representation - closer to ontological knowledge representation.

● graph like queries involving some reasoning.

● Dynamik model evolution

● RDBS schema vs. Ontology change.

Relational Database vs. Ontology

Relational database systems

● are perfect to model relationships with a static knowledge model (i.e. static relationship schema)

● schema change is problematic and

● no notion of hierarchies.

Ontology knowledge systems

● allow to store the same datatypes as relational database systems

● allow for modelling relationships

– in a different way closer to concept maps then to relation tables● have a built in notion of hierarchies!

● and allow even more reasoning.

Queries on Graph Structures

select all Accesorios es elabora con Técnica para faz de trama

Requirements for Museum KMS

A museum knowledge management system should

● facilitate relations between entities

● have built in support for basic reasoning

● should be flexible w.r.t. the evolution of knowledge model

● facilitate storage of basic datatypes (numbers, boolean, ...), free text, and multimedia.

Conclusion

● the RDB approach is insufficient w.r.t. model evolution and reasoning.

● Ontological storage engine required.

● Which is the best for our purpose?

Review of Triplestores

State of the art surveys:

● http://www.w3.org/wiki/LargeTripleStores

● Europeana RDF Store Report (2011)

● An incomplete list of triple stores:

● Native stores: AllegroGrah, OWLIM, stardog

● RDBMS based: Oracle, Jena SDB

● hybrid: Virtuoso, Sesame, BigData

Our Decision: Virtuoso

● why virtuoso:

● multi paradigm storage: RDBM (SQL), XML (XQuery), OWL (SPARQL), reasoning.

● scalable, massive data processing, stable, opensource edition, active community.

● some know how from former projects

● may be drawbacks:

● too many ways to implement a knowledge base.

● manual 4000 pages.

● reasoning capabilities beyond reasoners like Pellet.

Knowledge Engineering

Formal ontologies in a nutshell

Conceptual issues

Ontology in a nutshell

● unary constructs:

● individuals (e.g. the textile object whose ID is ILCA_BML074)

● class (e.g. the set of all garment classified as Poncho)

● binary constructs:

● object property = relation between individuals (e.g. in custody of: textile object ILCA_BML074 is in custody of the British Museum)

● data property = attribute of an individual (e.g. has width: textile object ILCA_BML074 has width 52 cm)

● instance of (type) = a relation between individuals classes (e.g. textile object ILCA_BML074 is an instance of the class Facha Ancha)

● subclass relation = relation between classes (e.g. Facha Ancha is a subclass of Accesorios)

● and even more like: union, intersection, complement, quantification, number restriction, ...

Ontology in a nutshell

Ontology Schema and Facts

Ontology schema (TBox)

● subclass relations (e.g. Poncho is subclass of Producto Textil)

● domain and range restrictions of

● object properties (e.g. in custody of has domain Producto Textil and as range Museum)

● data properties (e.g. has width has domain Producto Textil and cm as range)

Ontology facts (ABox)

● all relations involving individuals (instance of, object properties, data properties)

TBox

ABox

Knowledge Engineering

Formal ontologies in a nutshell

Conceptual issues

Abstract Entities

● Abstracts entities: don't exist in space or in time.

● Concrete entities exist at least in time. For example:

● physical objects (like garments, books, etc.)

● events (like the production of a certain garment)

● Entities like colour, material, and shape are rather time independent.

● what is the appropriate way to model abstract entities?

In OWL we have only two options: as classes or instances.

● For concrete entities it is easy:

● my jacket I am wearing is an instance of the class of all Jackets which is a subclass of physical objects.

● the discovery of Machu Picchu by Hiram Bingham is an instance of the class of all discoveries which is a subclass of events.

Abstract Entities

● What about abstract entities: can they have subclasses or instances? For example colours:

– is the red we see here one instance and the red we see there another instance?

– If so, isn't it inconsistent to say that they are both the same reds? (we introduced the concept of colour coccurrence).

– is red a unique colour or a class of colors whose instances are e.g. dark-red, orange-red.

– aren't dark-red and orange-red rather themselves classes of reds?

– are there at all colours that are not subdividable into more granular colour values? (we chose to stop at RGB. For physicians wave lenght would make more sense).

Semi Abstract Entities

● structure, technique, motive:

● not localized in space: possibly at two different place at the same time.

● not localized in time: may exist even if currently not applied or observed.

● but: techniques / motives are invented and can be forgotten

● epoch and style

● seem to be clearly bound to a certain time period, but

● at least styles may revive at any time.

● epoch is a highly debated concept anyway.

Anonymous Entities

● How should we formalize “Poncho p1 is made of Alpaca”?

The naive way:

p1 made_of a1. p1 type Poncho. a1 type Alpaca.

p1 is a concrete object we can point to. What about a1?

● Consider: “Poncho p2 is also made of Alpaca”.

p2 made_of a2. p2 type Poncho. a2 type Alpaca.

Is a1=a2 or not?

We don't know and we don't care!

Anonymous Entities

● Proper formalization of “Poncho p1 is made of Alpaca”:

p1 type (made_of some Alpaca)

● meaning:

● p1 is an instance of the class (made_of some Alpaca)

● (made_of some Alpaca) is the class of all x such that there exists and an a which is an instance of Alpaca.

short: “p1 is made of some instance of Alpaca”

Limited Reasoning in Virtuoso

● (made_of some Alpaca) is quantified class expression

(some is its quantifier)

● Problem with Virtuoso: it accepts quantified expressions, but does not support reasoning on them.

● Example:

p1 type (made_of some Alpaca)

Alpaca subClassOf Camelido

=> p1 type (made_of some Camelido)

● Virtuoso cannot infer this conclusion.

Prototypes as Workaround

Workaround for the Quantification Problem

● introduce a class Prototype

● create for every class (if needed) a dedicated instance of prototype.

● Example:

alpaca type Prototype. alpaca type Alpaca.

alpaca prototype_for Alpaca.

Prototypes as Workaround

Reasoning via prototypes

● Replace p1 type (made_of some Alpaca)

by p1 made_of alpaca.

● Now Virtuoso can deduce:

p1 made_of alpaca. Alpaca subClassOf Camelido.

=> p1 made_of ?x. ?x type Camelido.

● Note:

● prototypes, in contrast to regular physical individuals, are not located in space and time ( => modeling conflict )

● alpaca prototype_for Alpaca is not OWL conform.

Ontological Mistakes

Confusing subclass and instance with part of:

● lake Titicaca is a spatial part of the Andes, but not a subclass of it.

● weaving is temporal part of garment production (dying another one), but neither an instance nor a subclass of it.

● part of is a super property of spatial- and temporal-part of.

Confusing subclass with instance:

● Poncho (as indefinite word) is not an instance of garment but a subclass: the class of all concrete ponchos.

Ontological Mistakes

Confusing determined with undetermined objects:

● in “this poncho (p1) is made of Alpaca”

Alpaca should not be modelled as a certain instance of the class Alpaca!

Confusing equivalence with synonymy and/or translations:

● if cloak same as manto and manto same as coat,

then cloak same as coat.

● if chair same as Sessel and Sessel same as armchair,

then chair same as armchair.

Related Work

Controlled vocabularies:

● Getty Thesaurus of Geographic Names (TGN),

● Cataloging Cultural Objects (CCO),

● Categories for the Description of Works of Art (CDWA)

Foundational Ontologies:

● The CIDOC Conceptual Reference Model (CRM):

concepts and relationships used in cultural heritage documentation.

● DOLCE (Descriptive Ontology for Linguistic and Cognitive Engineering)

Linking open data (LOD):

● dbpedia, freebase, geonames, ... (http://linkeddata.org/)

● Linked Data and SPARQL service of British Museum

Content Processing

From Structured Content to RDF

TBox

ABox

Migration of Knowledge Representations

Separation of knowledge modelling:

● TBox knowledge created with graph drawing tools (http://www.yworks.com)

● ABox facts created in spreadsheets

Technical challenges:

● migration to target format for TBox and ABox: RDF triples (source node - link - target node)

● TBox migration: easy

● ABox migration: difficult - due to irregular spreadsheets

● TBox & ABox vocabulary alignment: tedious

Concept Hierarchies as TBox

ABox: facts in a spreadsheet

Workflows and Toolsspreadsheets concept maps

Workflows and Toolsspreadsheets concept maps

RDF

Problem: inconsistent vocabulary!

Workflows and Toolsspreadsheets concept maps

RDF

Workflows and Tools

Workflows and Tools

<

spreadsheets concept maps

RDF

Thank you!