36
© Paul Buitelaar: eJustice Presentation, July 15th, 2004 Ontologies Contributions from Language Technology Paul Buitelaar DFKI GmbH Language Techology Lab DFKI Competence Center Semantic Web Saarbrücken, Germany

© Paul Buitelaar: eJustice Presentation, July 15th, 2004 Ontologies Contributions from Language Technology Paul Buitelaar DFKI GmbH Language Techology

Embed Size (px)

Citation preview

© Paul Buitelaar: eJustice Presentation, July 15th, 2004

Ontologies Contributions from Language

Technology

Paul Buitelaar DFKI GmbH

Language Techology LabDFKI Competence Center Semantic Web

Saarbrücken, Germany

© Paul Buitelaar: eJustice Presentation, July 15th, 2004

OverviewOntologies and the Semantic Web Semantic Web Intro Ontologies and Knowledge Markup Ontology Development Ontology Lifecycle & Language Technology

Language Technology Levels of Automatic Linguistic Analysis

Ontologies in Multilingual Information Access A Medical Example: MuchMore Project Semantic Resources in the Medical Domain Demo MuchMore System Language Technology in Annotation and Indexing

Conclusions MuchMore for the Legal Domain…

© Paul Buitelaar: eJustice Presentation, July 15th, 2004

Semantic Web

Semantic Web

Intelligent Man-Machine Interface

KnowledgeMarkup Ontologies

Semantic Web Services

© Paul Buitelaar: eJustice Presentation, July 15th, 2004

Ontology-based Knowledge MarkupSemantic Metadata

Metadata, e.g. Dublin Core -- Title, Author, etc. Semantic:Formal Properties of Objects of Class Author

<xmnls jobs="http://www.jobs.org/daml+oil-jobs-ontology#">

<jobs:systems-analyst>John Smith</jobs:systems-analyst>

Knowledge Markup

© Paul Buitelaar: eJustice Presentation, July 15th, 2004

Semantic Web Architecture

Layered Architecture (Tim Berners-Lee)

© Paul Buitelaar: eJustice Presentation, July 15th, 2004

Knowledge Markup Languages

XML Schema Namespaces Interpretation Context

RDF Schema

OWL

(DAML+OIL)

Formalization:

Classes (Inheritance),

Properties

Formalization:

Classes, Class Definitions,

Properties, Property Types

(e.g. Transitivity)

Data Types

XML

RDF

Syntax Semantics

© Paul Buitelaar: eJustice Presentation, July 15th, 2004

Ontologies: Basic Idea

Definition “… Explicit, Formal Specification of a Shared Conceptualization of a Domain of Interest”T. Gruber Towards principles for the design of ontologies used for knowledge sharing. Int. J. of Human and Computer Studies, 1994

Purpose Knowledge Sharing (e.g. between Agents) Inference (over Sets of Instances)

Related Areas, e.g. Terminologies, Controlled Vocabulary, Thesauri, Taxonomies, Semantic Lexicons, Wordnets, etc. Conceptual Models, Schemas, etc.

© Paul Buitelaar: eJustice Presentation, July 15th, 2004

Ontologies: Applications, e.g.

Semantic Web Services Interoperability for (Semantic) Web Services

Intelligent Agents Domain Models for Intelligent Agents

Text Interpretation Ontology-aware Information Extraction

Multimedia Integration Ontology-based Alignment of Extracted Objects in Text, Audio, Video

Intelligent Search/Navigation Ontology-based Indexing in Web-Retrieval

© Paul Buitelaar: eJustice Presentation, July 15th, 2004

Ontologies: Development

Ontology Editor / KB Management Most Widely Used: Protégé (Stanford University, Medical Informatics, USA) Originally for Development and Maintenance of Medical Expert Systems

Other, e.g.

KAON: University of Karlsruhe - AIFB, Germany WebOde: UPM – Ontology Group, Madrid, Spain WebOnto: Open University - KMI, UK

Overview at XML.com by Michael Denny: Ontology Building: A Survey of Editing Tools

© Paul Buitelaar: eJustice Presentation, July 15th, 2004

Class Hierarchy

Slot Descriptions

http://dmag.upf.es/ontologies/2003/12/ipronto.owl

© Paul Buitelaar: eJustice Presentation, July 15th, 2004

Ontology Lifecycle

Creating

Populating

Validating

Evolving

Maintaining

Deploying

© Paul Buitelaar: eJustice Presentation, July 15th, 2004

LT in the Ontology Lifecycle

Ontology(Knowledge)

Creating & EvolvingLinguistic Analysis to Extract

Classes / Relations

Populating

(Knowledge Base Generation)Linguistic Analysis to Extract

Instances

Instances

Documents(Text)

Language Technology (LT) for Ontology:

Language Technology = Automated Linguistic Analysis

Classes,Relations/Properties

© Paul Buitelaar: eJustice Presentation, July 15th, 2004

Linguistic Analysis: Example

The Dell computer with a flat screen had to be rejected because of a failure in the motherboard.

Dell computerflat screen

motherboard

has-a

has-a

reject

failurelocation-of

animate-entity

© Paul Buitelaar: eJustice Presentation, July 15th, 2004

Part-of-Speech, Morphology

Part-of-Speech e.g.: noun, verb, adjective, preposition, … PoS tag sets may have between 10 and 50 (or more) tags

Morphology Most languages have inflection and declination, e.g.:

Singular/Plural computer, computers Present/Past reject, rejected

Many languages have also complex (de)composition, e.g.:

Flachbildschirm (flat screen) > flach + Bildschirm> flach + Bild + Schirm

© Paul Buitelaar: eJustice Presentation, July 15th, 2004

Phrases, Terms, Named Entities

Semantic Units Phrases (e.g. nominal - NP, prepositional - PP)

NP a flat screenPP with a flat screenNP (recursive) the Dell computer with a flat

screen a failure in the motherboard

Terms (domain-specific phrases)Dell computer

Dell computer with a flat screen

Named Entities (phrases corresponding to dates, names, …)

COMPANY Dell COMPANY Dell Computer Corporation PERSON Michael Dell

© Paul Buitelaar: eJustice Presentation, July 15th, 2004

Dependency Structure

Semantic StructureDependencies between Predicates and Arguments

the Dell computer with a flat screen had to be rejected

PRED: rejectARG1: ENTITYARG2: ‘the Dell computer with a flat screen’

‘Logical Form’ : reject(x,y) & animate-entity(x) & computer(y) & …

The Dell computer that has been rejected was claimed to have suffered from handling.

reject(e1,x1,y1) & animate-entity(x1) & Dell_computer(y1) & claim(e2,x2,e3) & animate-entity(x2) & suffer_from(e3,y1,y2) & handling (y2)

© Paul Buitelaar: eJustice Presentation, July 15th, 2004

MuchMore Project

Demonstration Prototype Real-Life Medical Scenario for Cross-Lingual Information Retrieval

Research & Development Combined Data- and Knowledge-Driven

Performance Evaluation Performance Comparison of Existing and Novel Methods

http://muchmore.dfki.de

© Paul Buitelaar: eJustice Presentation, July 15th, 2004

GeneralWordNet (EN), GermaNet (DE), EuroWordNet (“linked”)

Medical DomainUMLS: Unified Medical Language System

Medical MetaThesaurus (only MeSH2001 is used)

English, German, Spanish, …730.000 Concepts9 Relations (Broader, Narrower,…)

Semantic Network

134 Semantic Types54 Semantic Relations

Semantic Resources

© Paul Buitelaar: eJustice Presentation, July 15th, 2004

C0019682|ENG|P|L0019682|PF|S0048631|HIV|0|

C0019682|ENG|S|L0020103|PF|S0049688|HTLV-III|0|

C0019682|ENG|S|L0020128|VS|S0049756|Human Immunodeficiency Virus|0|

C0019682|ENG|S|L0020128|VWS|S0098727|Virus, Human Immunodeficiency|0|

C0019682|FRE|P|L0168651|PF|S0233132|HIV|3|

C0019682|FRE|S|L0206547|PF|S0277133|VIRUS IMMUNODEFICIENCE HUMAINE|3|

C0019682|GER|P|L0413854|PF|S0538136|HIV|3|

C0019682|GER|S|L1261793|PF|S1503739|Humanes T-Zell-lymphotropes Virus Typ III|3|

other languagesGERMAN 66,381ENGLISH 1.462,202

Concept Names: 1.734,706

Each CUI (Concept Unique Identifier) is mapped to one out of 134 Semantic Types or TUI (Type Unique Identifier)

Clozapine: C0009079 Pharmacologic Substance: T121

MetaThesaurus, SemNet

Semantic Types are organized in a Network through 54 Relations

T121|T154|T047

© Paul Buitelaar: eJustice Presentation, July 15th, 2004

© Paul Buitelaar: eJustice Presentation, July 15th, 2004

© Paul Buitelaar: eJustice Presentation, July 15th, 2004

© Paul Buitelaar: eJustice Presentation, July 15th, 2004

© Paul Buitelaar: eJustice Presentation, July 15th, 2004

© Paul Buitelaar: eJustice Presentation, July 15th, 2004

© Paul Buitelaar: eJustice Presentation, July 15th, 2004

© Paul Buitelaar: eJustice Presentation, July 15th, 2004

© Paul Buitelaar: eJustice Presentation, July 15th, 2004

© Paul Buitelaar: eJustice Presentation, July 15th, 2004

© Paul Buitelaar: eJustice Presentation, July 15th, 2004

© Paul Buitelaar: eJustice Presentation, July 15th, 2004

Token (with Part-of-Speech)German: Kreuzbandes English: ligaments

Lemma (or Sequence of Lemmas - Decomposition)German: Faserknorpel Faser + KnorpelEnglish: ligament

UMLS Concept Code and Semantic Typeligament : C0022745_T030

MeSH CodeA2.513

Semantic Relation (over a Pair of UMLS Concepts)C0022745_T030 interconnects C0047693_T065

Annotation & Indexing

© Paul Buitelaar: eJustice Presentation, July 15th, 2004

UMLS Semantic Network specifies 54 types of relations between 134 semantic types

Pharmacologic Substance affects Cell Function

Relations are generic and potentially falseTherapeutic Procedure method_of Occupation,Discipline

*discectomy method_of history

Relations are ambiguousTherapeutic Procedure prevents Neoplastic ProcessTherapeutic Procedure complicates Neoplastic ProcessTherapeutic Procedure affects Neoplastic ProcessTherapeutic Procedure treats Neoplastic Process

Relations

© Paul Buitelaar: eJustice Presentation, July 15th, 2004

Discontinuation of heparin is a simple and essential maneuvre, and anticoagulation has to be continued by alternative drugs.

Example

© Paul Buitelaar: eJustice Presentation, July 15th, 2004

Terms: C0019134 Heparin

C0005790 Blood coagulation tests

C0013227 Pharmaceutical preparations

Example: Terms/ConceptsDiscontinuation of heparin is a simple and essential maneuvre, and anticoagulation has to be continued by alternative drugs.

© Paul Buitelaar: eJustice Presentation, July 15th, 2004

Relations: C0019134 interacts_with C0013227C0005790 analyses C0019134 C0005790 analyses C0013227

Example: Relations

Terms: C0019134 Heparin

C0005790 Blood coagulation tests

C0013227 Pharmaceutical preparations

Discontinuation of heparin is a simple and essential maneuvre, and anticoagulation has to be continued by alternative drugs.

© Paul Buitelaar: eJustice Presentation, July 15th, 2004

Conclusions

MuchMore for the Legal Domain…

ResourcesLegal Domain Ontology with…

…Large-scale Terminology for Multiple Languages, or if not available…

…Large Legal Domain Corpora in Multiple Languages for Term Extraction…

…and for Relation Extraction if Ontology Needs to be Constructed/Adapted

ToolsLinguistic Analysis (PoS, Morphology, Term Grammars, etc.)…

…for Multiple Languages…

…Tuned to the Legal Domain…

Information Retrieval Infrastructure, Interface Design, etc.