34
CPE 641 Natural Language Processing Ontologies Asst. Prof. Nuttanart Facundes, Ph.D.

CPE 641 Natural Language Processing Ontologies Asst. Prof. Nuttanart Facundes, Ph.D

Embed Size (px)

Citation preview

CPE 641 Natural Language Processing

Ontologies

Asst. Prof. Nuttanart Facundes, Ph.D.

Introduction

• Tim Berners-Lee, creator of the WWW, foresees a future when the Web will be more than just a collection of web pages (Berners-Lee et al.,2001).

• This mean: computers will be able to consider the meaning, or semantics, of information sources on the web & to meaningfully interpret the wealth of knowledge available on the web.

Semantic Web

• This has a lot to do with agents.• Agents will search and perform tasks for human

users. This is done on Semantic Web. • To realize the Semantic Web is to transform the

current web into an information space with a semantic organizational foundation, or an information space that makes information semantically accessible to machines by considering its meaning.

Ontologies

• Information resource on the Semantic Web will not only contain data, but will also consist of metadata which describe what the data are about.

• This will allow agents and human users to identify, collect and process suitable information sources by interpreting the semantic metadata based on the task at hand.

• The semantic foundation will be provided by ontologies.

Origin

• Ontology (uncountable noun with no plural) – philosophical discipline that studies the nature of being, a system of categories.

• AI researchers borrow the term to mean: a designed artefact consisting of shared vocabulary used to describe entities in some domain of interest

What is ontology?

• An explicit specification of a conceptualisation.

• Ontological Commitment

• The agreements about the object and relation being talked about among agents

Definitions

• An ontology is a shared conceptualization of a domain

• An ontology is a set of definitions in a formal language for terms describing the world

Motivation

• select EMPDAT from PERSTAB where POS=“mgmnt”– What does it mean?– PERSTAB is a table which lists employee data

• What’s an employee? How is an employee different from a contractor? What if I want data on both?

• Even if this information is available in English, a human has to read it

Motivation (2)

• "Parenthood is a more general relationship than motherhood."

• "Mary is the mother of Bill."

• "Who are Bill's parents?“

• "Mary is the parent of Bill.”– that fact is not stated anywhere, but can be

derived by a DAML application.

Example from “Why Use DAML?” <http://www.daml.org/2002/04/why.html>

Motivation (2) continued• More formally stated, given the statements

(motherOf subProperty parentOf)(Mary motherOf Bill)

• when stated in DAML, allows you to conclude

(Mary parentOf Bill)

• Java code or a stored procedure could do this sort of inference for facts in XML or SQL

• But the DAML spec itself says the conclusion is true• In contrast, different Java code could reach a different

conclusion

Motivation (2) continued

• (Mary motherOf Bill)• (parentOf inverse childOf)• (Bill childOf ?X)

• ?X = Mary

• The semantics of inverse is part of the DAML spec

Language Formality and Expressiveness

Formality

Exp

ress

iven

ess

Human Language

KIFCycL

OWLF-Logic

DAML

XML

SQL

HumanConsumption

MachineProcessing

MachineInference

Content Formality and Size

Formality

WordNet

Cyc

SUMO

DOLCE

Lexicons Formal Ontology

Taxonomy

Siz

e

SUMO+domain

UMLSYahoo!

Many Ways to Use Ontology• As an information engineering tool

– Create a database schema– Map the schema to an upper ontology– Use the ontology as a set of reminders for additional

information that should be included• As more formal comments

– Define an ontology that is used to create a DB or OO system

– Use a theorem prover at design time to check for inconsistencies

• For taxonomic reasoning– Do limited run-time inference in Prolog, a description

logic, or even Java• For first order logical inference

– Full-blown use of all the axioms at run time

Upper Ontology

• An attempt to capture the most general and reusable terms and definitions

Motivation

• Ontologies may have different names for the same things– type – a relation between a class and an instance– instance – a relation between a class and an instance– isa – a relation between a class and an instance– …

• Ontologies may have the same name for different things, and no corresponding terms– before – a relation between two time points– before – a relation between two time intervals

• Either use the same upper ontology, or at least map to a common upper ontology

Formal Upper Ontologies

• DOLCE

• Cyc

• SUMO

Simple Methodology

• Extract nouns and verbs from a source text• Find classes in SUMO for the nouns and verbs• Record a mapping as being either equal, subsuming or instance.

– type a single word that relates to the UBL term in the "SUMO term" or "English Word" text areas in the SUMO browser

• Create a subclass of SUMO if it's a subsuming mapping• Add properties to the subclass

– reusing SUMO properties– extending SUMO properties by creating a &%subrelation of an existing

property• Add English definition to the class

– define constraints that express how the subclass is more specific than the superclass

• Express the classes and properties in KIF and begin creating axioms, based on the English definitions created previously

First Exercise (1)

• “Seven Turkish nationals of Chechen origin hijacked a Russia-bound Panamanian ferry in Trabzon. The hijackers initially threatened to kill all Russians on board unless Chechen separatists being held in Dagestan, Russia, were released. On 19 January 1998, the hijackers surrendered to Turkish authorities outside the entrance to the Bosporus. The passengers were unharmed.“

• Identify items that need formalization – start with nouns and verbs

First Exercise (2)

• “Seven Turkish nationals of Chechen origin hijacked a Russia-bound Panamanian ferry in Trabzon. The hijackers initially threatened to kill all Russians on board unless Chechen separatists being held in Dagestan, Russia, were released. On 19 January 1998, the hijackers surrendered to Turkish authorities outside the entrance to the Bosporus. The passengers were unharmed.“

• Now create terms that correspond to the nouns and verbs

• Remove redundancy• Are there any “background” notions that are not explicit

in the text?

First Exercise (3)

• Seven Turkish nationals of Chechen origin hijacked a Russia-bound Panamanian ferry in Trabzon. The hijackers initially threatened to kill all Russians on board unless Chechen separatists being held in Dagestan, Russia, were released. On 19 January 1998, the hijackers surrendered to Turkish authorities outside the entrance to the Bosporus. The passengers were unharmed

• Turkey, Chechnya, Nationality, Hijacking, Threatening, Killing, Releasing, Holding, Dagestan, Russia, Separatist, Entrance, Bosporus, Unharmed, Panama, Trabzon, Authority, Outside, boundFor, Ferry, onBoard

SUMO Overview

• Understanding what’s in the upper ontology, in order to use it effectively

High Level Distinctions

The first fundamental distinction is that between ‘Physical’ (things which have a position in space/time) and ‘Abstract’ (things which don’t)

Physical Abstract

High Level Distinctions

Partition of ‘Physical’ into ‘Objects’ and ‘Processes’

Physical

Object Process

Objects

Object

SelfConnectedObject

Substance

CorpuscularObject

Region

Collection

ProcessesDualObjectProcess Substituting Transaction Comparing Attaching Detaching Combining SeparatingInternalChange BiologicalProcess QuantityChange Damaging ChemicalProcess SurfaceChange Creation StateChangeShapeChange

IntentionalProcess IntentionalPsychologicalProcess RecreationOrExercise OrganizationalProcess Guiding Keeping Maintaining Repairing Poking ContentDevelopment Making Searching SocialInteraction ManeuverMotion BodyMotion DirectionChange Transfer Transportation Radiating

Abstract

SetOrClass

Relation

Proposition

Quantity

Number

PhysicalQuantity

Attribute

Graph

GraphElement

A Little Bit of Logic

• Instance – GeorgeBush, Iraq, BobsRightBigToe• Class – Human, Nation• Relation – WWI before WWII, Bill childOf Mary• => (read as “implies”) - if X then Y• and – X and Y are true• or – X or Y (or both) are true• not – not X – the opposite of the truth of X• exists ?X – there exists something about which the

following is true

A Little “Structural” Ontology(instance GeorgeBush Human) – GeorgeBush is an instance of the

class of humans

(exists (?X) (parent ?X GeorgeBush)) – there exists something of which George

Bush is the parent

(instance parent BinaryPredicate) – the relation of parent is a binary relation

(domain parent 1 Organism) – the first argument to the parent relation must be an instance of the class Organism

(domain parent 2 Organism) – similarly for the second argument

Linking to SUMO Terms

• Nation, Confining, Committing, SocialRole, TransportationDevice, Killing, Near, Injuring, citizen, (not…), (exists…)

• Terms from the exercise (may or may not be the same as SUMO terms): – Turkey, Chechnya, Nationality, Hijacking, Threatening,

Killing, Releasing, Holding, Dagestan, Russia, Separatist, Entrance, Bosporus, Unharmed, Panama, Trabzon, Authority, Outside, boundFor, Ferry, onBoard

• Use the terms in the first bullet to define the terms in the second bullet– Use Nation to state: (instance Turkey Nation)

Formalization

(exists (?TURK …)

(and

(citizen ?TURK Turkey))

)

Formalization

(exists (?TURK ?FERRY …)

(and

(citizen ?TURK Turkey)

(instance ?FERRY FerryBoat)

)

Formalization

(exists (?TURK ?FERRY ?HIJACK)

(and

(citizen ?TURK Turkey)

(instance ?FERRY FerryBoat)

(instance ?HIJACK Hijacking)

(agent ?HIJACK ?TURK)

(patient ?HIJACK ?FERRY)

(earlier ?HIJACK

(DayFn 19

(MonthFn January

(YearFn 1998))))))