31
RELATIONAL SUPPORT FOR PROTEGE Raimundo Lozano Felipe Geva Xavier Pastor CSC

RELATIONAL SUPPORT FOR PROTEGE › conference › 2003 › Raimundo_Lozano_Protege.pdfSixth International Protégé Workshop INTRODUCTION SCOPE: “Structuring Concepts for Online

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: RELATIONAL SUPPORT FOR PROTEGE › conference › 2003 › Raimundo_Lozano_Protege.pdfSixth International Protégé Workshop INTRODUCTION SCOPE: “Structuring Concepts for Online

RELATIONAL SUPPORTFOR PROTEGE

Raimundo LozanoFelipe Geva

Xavier PastorCSC

Page 2: RELATIONAL SUPPORT FOR PROTEGE › conference › 2003 › Raimundo_Lozano_Protege.pdfSixth International Protégé Workshop INTRODUCTION SCOPE: “Structuring Concepts for Online

Sixth International Protégé Workshop

INTRODUCTION

SCOPE: “Structuring Concepts for Online Publishing Environment” (Project 22016Y1C1DMAL2)

Goal: Structuring scientific information in an ontology

Medical domain: Gastroenterology and Hepatology (G&H)

Hypothesis: Users able to search and retrieve information with

a higher level of abstraction than with actual keyword-based

systems

Implementation: Integrated tool for building and maintaining

medical ontologies

Page 3: RELATIONAL SUPPORT FOR PROTEGE › conference › 2003 › Raimundo_Lozano_Protege.pdfSixth International Protégé Workshop INTRODUCTION SCOPE: “Structuring Concepts for Online

Sixth International Protégé Workshop

OBJECTIVES

Related functionalityImplement semantic search of contents

Knowledge representation

Multilingual support

Related interfaceFriendly interface

Structured presentation of results

Related technologyUse of standards

Open source contribution

Page 4: RELATIONAL SUPPORT FOR PROTEGE › conference › 2003 › Raimundo_Lozano_Protege.pdfSixth International Protégé Workshop INTRODUCTION SCOPE: “Structuring Concepts for Online

Sixth International Protégé Workshop

METHODOLOGY

ConceptsMedical terminology framework of reference: UMLS (NLM)

Metathesaurus: + 800.000 conceptsMultilingual supportRelational database developed

Knowledge representation systemRDF: Relational database developedProtégé 2000: Extended

Articles categorizationRDF models storageUMLS search capabilities

Retrieve system: On the Web

Page 5: RELATIONAL SUPPORT FOR PROTEGE › conference › 2003 › Raimundo_Lozano_Protege.pdfSixth International Protégé Workshop INTRODUCTION SCOPE: “Structuring Concepts for Online

Sixth International Protégé Workshop

GENERAL SCHEMA

Query Subsystem

Articles

User interface

Output

G&H

UMLS

Query

Ontological search

Related articles

Words

Concepts

Multilinguality

Ontology

Ontologicalorganization

?

Page 6: RELATIONAL SUPPORT FOR PROTEGE › conference › 2003 › Raimundo_Lozano_Protege.pdfSixth International Protégé Workshop INTRODUCTION SCOPE: “Structuring Concepts for Online

Sixth International Protégé Workshop

IMPLEMENTATION

UMLS Relational DB

RDF Relational DB

Developed Plugins for Protégé UMLS

Categorization

RDF DB

Web searching system

Page 7: RELATIONAL SUPPORT FOR PROTEGE › conference › 2003 › Raimundo_Lozano_Protege.pdfSixth International Protégé Workshop INTRODUCTION SCOPE: “Structuring Concepts for Online

Sixth International Protégé Workshop

UMLS - Metathesaurus

CONCEPTs TERMs STRINGCONCEPTs TERMs STRING

CUI’s SUI’sLUI’s

STRINGSTRING

STRING

STRINGSTRING

STRINGSTRING

Is organized by concept or meaning; its purpose is to link alternative names and views of the same concept together and to identify useful relationships between different concepts.

STRINGSTRING

Page 8: RELATIONAL SUPPORT FOR PROTEGE › conference › 2003 › Raimundo_Lozano_Protege.pdfSixth International Protégé Workshop INTRODUCTION SCOPE: “Structuring Concepts for Online

Sixth International Protégé Workshop

UMLS - Normalization

term

concept

lui = lui�Upd(R); Del(R)

stt = stt�Upd(R); Del(R)

sui = sui�Upd(R); Del(R)

sui = sui�Upd(R); Del(R)sui = sui�Upd(R); Del(R)

sui = sui�Upd(R); Del(R)

sui = sui�Upd(R); Del(R)

cui = cui�Upd(R); Del(R)

concpt : 1cuivalid

char(8)numeric(4)

<pk>

concpt_pk

termluicuits

char(8)char(8)char(1)

<pk><fk>

term_pkterm_concept_fk1

string_typesttdescrp

varchar(3)varchar(255)

<pk>

string_type_pk

mrxw_itasuiwd

char(8)varchar(80)

<pk,fk><pk>

italian_fk1

string : 1suiluisttlong_strstrstr_txtlatlrl

char(8)char(8)varchar(3)bitvarchar(255)textchar(3)int

<pk><fk1><fk2>

string_pkstring_term_fk1string_type_string_fk2

mrxw_spasuiwd

char(8)varchar(80)

<pk,fk><pk>

spanish_fk1

mrxw_gersuiwd

char(8)varchar(80)

<pk,fk><pk>

german_fk1

mrxw_fresuiwd

char(8)varchar(80)

<pk,fk><pk>

french_fk1

mrxw_engsuiwd

char(8)varchar(80)

<pk,fk><pk>

english_fk1

MRST YCUIT UIST Y

A8A4VA41

MRCONCUILATTSLUISTTSUISTRLRL

A8A3A1A8VA3A8TXTSI

MRXW .SPALATW DCUILUISUI

A3VA80A8A8A8

MRXW .ENGLATW DCUILUISUI

A3VA80A8A8A8

MRXW .FRELATW DCUILUISUI

A3VA80A8A8A8

MRXW .GERLATW DCUILUISUI

A3VA80A8A8A8

MRXW .IT ALATW DCUILUISUI

A3VA80A8A8A8

String typestring

ORIGINAL

METATHESAURUS

FILES

NORMALIZED

TABLES

Mrxw_eng

Mrxw_fre

Mrxw_ger

Mrxw_spa

Mrxw_ita

Page 9: RELATIONAL SUPPORT FOR PROTEGE › conference › 2003 › Raimundo_Lozano_Protege.pdfSixth International Protégé Workshop INTRODUCTION SCOPE: “Structuring Concepts for Online

Sixth International Protégé Workshop

UMLS - accessed from Protégé

Page 10: RELATIONAL SUPPORT FOR PROTEGE › conference › 2003 › Raimundo_Lozano_Protege.pdfSixth International Protégé Workshop INTRODUCTION SCOPE: “Structuring Concepts for Online

Sixth International Protégé Workshop

UMLS - Functionality

englishgermanspanish

frenchitalian

CUI + WD

CUI + Description

English_Concept

French_Concept

German_Concept

Italian_Concept

Spanish_ConceptConceptCUIDescription

A8VA255

EnglishCUIWD

A8VA80

FrenchCUIWD

A8VA80

GermanCUIWD

A8VA80

ItalianCUIWD

A8VA80

SpanishCUIWD

A8VA80

OUTPUT QUERY

liverleberhígadofoiefegato

C0023895 Disease of liverC0023908 Liver transplantC0085605 Liver function failureC0023899 Liver ExtractC0019204 Carcinoma of liver cell

INPUT QUERY

liver

Page 11: RELATIONAL SUPPORT FOR PROTEGE › conference › 2003 › Raimundo_Lozano_Protege.pdfSixth International Protégé Workshop INTRODUCTION SCOPE: “Structuring Concepts for Online

Sixth International Protégé Workshop

RDF

RDF “statements” consist ofresources (= nodes)

which have propertieswhich have values (= nodes, strings)

= subject= predicate= object

predicate(subject, object)resource valueproperty

The sentence “http://www.w3.org/Home/Lasilla has creator Ora Lasilla” would thus be diagrammed as:

From W3C RDF Model and Syntax Specification

Page 12: RELATIONAL SUPPORT FOR PROTEGE › conference › 2003 › Raimundo_Lozano_Protege.pdfSixth International Protégé Workshop INTRODUCTION SCOPE: “Structuring Concepts for Online

Sixth International Protégé Workshop

RDFS

Collection of RDF resources that can be used to describe other resourcesProvide a mechanism to define vocabularies

RDFS basic elements

From W3C RDF Schema Specification

Page 13: RELATIONAL SUPPORT FOR PROTEGE › conference › 2003 › Raimundo_Lozano_Protege.pdfSixth International Protégé Workshop INTRODUCTION SCOPE: “Structuring Concepts for Online

Sixth International Protégé Workshop

RDF STORAGE - Requirements

Wide scope, not limited to SCOPE project needsConceptual representation. Not attached to any specific formatPortable between different DBMS.

Sybase Adaptive Server AnywhereSybase Adaptive Server EnterpriseOracle 8i

Efficiency retrieving concepts

Page 14: RELATIONAL SUPPORT FOR PROTEGE › conference › 2003 › Raimundo_Lozano_Protege.pdfSixth International Protégé Workshop INTRODUCTION SCOPE: “Structuring Concepts for Online

Sixth International Protégé Workshop

RDF STORAGE

No good models proposedvery simplenot efficient

Solutionto design a new storage modeltaking advantage of relational capabilities

making explicit all RDF components defined in the RDFS specification: classes, properties, literals, etc.

Page 15: RELATIONAL SUPPORT FOR PROTEGE › conference › 2003 › Raimundo_Lozano_Protege.pdfSixth International Protégé Workshop INTRODUCTION SCOPE: “Structuring Concepts for Online

Sixth International Protégé Workshop

RDF – DB design (1)

Page 16: RELATIONAL SUPPORT FOR PROTEGE › conference › 2003 › Raimundo_Lozano_Protege.pdfSixth International Protégé Workshop INTRODUCTION SCOPE: “Structuring Concepts for Online

Sixth International Protégé Workshop

RDF – DB design (2)

Page 17: RELATIONAL SUPPORT FOR PROTEGE › conference › 2003 › Raimundo_Lozano_Protege.pdfSixth International Protégé Workshop INTRODUCTION SCOPE: “Structuring Concepts for Online

Sixth International Protégé Workshop

RDF – DB design (3)

Page 18: RELATIONAL SUPPORT FOR PROTEGE › conference › 2003 › Raimundo_Lozano_Protege.pdfSixth International Protégé Workshop INTRODUCTION SCOPE: “Structuring Concepts for Online

Sixth International Protégé Workshop

RDF – DB design (4)

Page 19: RELATIONAL SUPPORT FOR PROTEGE › conference › 2003 › Raimundo_Lozano_Protege.pdfSixth International Protégé Workshop INTRODUCTION SCOPE: “Structuring Concepts for Online

Sixth International Protégé Workshop

RDF STORAGE - Class hierarchy

Classes organized in a tree with indexesvery fast searches of subclasses

4

2

3 7

9

1 9

5 76

8 9

disease

ulcer

duodenalgastric

chronicacute

Page 20: RELATIONAL SUPPORT FOR PROTEGE › conference › 2003 › Raimundo_Lozano_Protege.pdfSixth International Protégé Workshop INTRODUCTION SCOPE: “Structuring Concepts for Online

Sixth International Protégé Workshop

RDF STORAGE – Multiple inheritance

4

2

3 7

9

1 9

5 76

8 9

disease

ulcer

duodenalgastric

Gastroduodenal chronic ulcer

acute

1 1disease 0

ulcer2 10

3 7

4 5

76

8 1gastric duodenal 0

Gastroduodenal chronic ulcer

acute Gastroduodenal chronic ulcer

9 10

Page 21: RELATIONAL SUPPORT FOR PROTEGE › conference › 2003 › Raimundo_Lozano_Protege.pdfSixth International Protégé Workshop INTRODUCTION SCOPE: “Structuring Concepts for Online

Sixth International Protégé Workshop

RDF STORAGE - Interface

Basic element: the Statement

RDF BD

To insert

To removeRDF Statement

Stored procedures

Page 22: RELATIONAL SUPPORT FOR PROTEGE › conference › 2003 › Raimundo_Lozano_Protege.pdfSixth International Protégé Workshop INTRODUCTION SCOPE: “Structuring Concepts for Online

Sixth International Protégé Workshop

PLUGINS – Common featuresEach plugin is implemented by a class derived from AbstractTabWidget.

Access to Protégé classesKnowledgeBase Class management -> ClsProperties management -> SlotTree interface -> ClsesPanel

Tab presentationEasy configuration

Database access using jdbc:odbc.It is allowed to choose the database

Plugin RDF.Plugin Categorization.

Page 23: RELATIONAL SUPPORT FOR PROTEGE › conference › 2003 › Raimundo_Lozano_Protege.pdfSixth International Protégé Workshop INTRODUCTION SCOPE: “Structuring Concepts for Online

Sixth International Protégé Workshop

PLUGINS - UMLS

UMLS

jdbc:odbc

connection

Page 24: RELATIONAL SUPPORT FOR PROTEGE › conference › 2003 › Raimundo_Lozano_Protege.pdfSixth International Protégé Workshop INTRODUCTION SCOPE: “Structuring Concepts for Online

Sixth International Protégé Workshop

PLUGINS - UMLS

Concept searchVariable number of terms allowedOrdered result list with the most similar concept highlighted

Adding a concept to the ontologyMultiple parents selection allowedAutomatic addition of:

UMLS code UMLS semantic typeSemantic description

Page 25: RELATIONAL SUPPORT FOR PROTEGE › conference › 2003 › Raimundo_Lozano_Protege.pdfSixth International Protégé Workshop INTRODUCTION SCOPE: “Structuring Concepts for Online

Sixth International Protégé Workshop

PLUGINS - Categorization

G&H

jdbc:odbc

connection

Page 26: RELATIONAL SUPPORT FOR PROTEGE › conference › 2003 › Raimundo_Lozano_Protege.pdfSixth International Protégé Workshop INTRODUCTION SCOPE: “Structuring Concepts for Online

Sixth International Protégé Workshop

PLUGINS - Categorization

Show the list of articlesTitle, volume, issue, abstract...

CategorisationSelecting an articleArticle class automatically createdArticle identifier automatically addedVolume and issue parents automatically statedAllow selecting other parents

Page 27: RELATIONAL SUPPORT FOR PROTEGE › conference › 2003 › Raimundo_Lozano_Protege.pdfSixth International Protégé Workshop INTRODUCTION SCOPE: “Structuring Concepts for Online

Sixth International Protégé Workshop

PLUGINS - RDF

.RDF.RDFS

JENA

Java API forRDF

RDF

Model comparison and statements extraction

Stored procedures on the database

RDF-XML file

Page 28: RELATIONAL SUPPORT FOR PROTEGE › conference › 2003 › Raimundo_Lozano_Protege.pdfSixth International Protégé Workshop INTRODUCTION SCOPE: “Structuring Concepts for Online

Sixth International Protégé Workshop

RDF STORAGE – Integrity

Valid model needed Protégé

Not ordered statements in RDF: validity assumed

Automatic creation of needed resources

e.g.: (Gastritis, type, Disease)If not exists Disease class Disease is created

Page 29: RELATIONAL SUPPORT FOR PROTEGE › conference › 2003 › Raimundo_Lozano_Protege.pdfSixth International Protégé Workshop INTRODUCTION SCOPE: “Structuring Concepts for Online

Sixth International Protégé Workshop

PROBLEMS – Name modification

The common identifier between the database and Protégé is

the resource name

The user is allowed to modify the name

Changes on Protégé needed

List of modified elements in DefaultKnowledgeBase

New attributes and functions in DefaultFrame

Page 30: RELATIONAL SUPPORT FOR PROTEGE › conference › 2003 › Raimundo_Lozano_Protege.pdfSixth International Protégé Workshop INTRODUCTION SCOPE: “Structuring Concepts for Online

Sixth International Protégé Workshop

PROBLEMS – Type definition

Problems with abbreviated format of type definition

Protégé read as a literal

RDFFrameWalker.getDirectType(Resource resource)

modified to create the class

Page 31: RELATIONAL SUPPORT FOR PROTEGE › conference › 2003 › Raimundo_Lozano_Protege.pdfSixth International Protégé Workshop INTRODUCTION SCOPE: “Structuring Concepts for Online

Sixth International Protégé Workshop

AKNOWLEDGEMENTS

SCOPE partnersUniversitat Pompeu Fabra: the coordinating institution for SCOPEDOYMA: a branch of Havas-MediMediaOrbiTeam Software GmbH: a spin-off company of GMD, the German National Research Center for Information Technology SESI group of the University of Wales, Bangor

Other institutionsStanford Medical InformaticsNational Library of Medicine