Thesis presentation

Embed Size (px)

Citation preview

PowerPoint-Prsentation - Folie 1

Creating Knowledge out of Interlinked Data

LOD2 Presentation . 02.09.2010 . Page

http://lod2.eu

AKSW, Universitt Leipzig

Sebastian Hellmann

A Transparent Formalization of Text for Machines

http://nlp2rdf.org

Start: Jan 2009Tentative End: Summer 2012

Introduction of the touched areas

Scientific Core

Evaluation

Plan

Overview

The Semantic Gap

The Semantic Gap

Most problems occurred at the bottom

Data integration is difficult, if the pivots are not well defined

Questions (in order):

What structure to use?

What URIs to use?

What is a String?

How can we teach machines to understand Strings (Knowledge Representation)?

How can we formalize text in a way, which is:Transparent for machines

Efficient for NLP Use Cases

Consistent with the Web architecture

Main question

Areas

Preliminary definition

The NLP Interchange Format (NIF) is an RDF/OWL-based format that aims to achieve interoperability between Natural Language Processing (NLP) tools, language resources and annotations.

This definition is still limited to RDF and NLP and targets software integration via a common exchange format

Scientific core

Scientific core

Scientific core

Intransparent for machines

Scientific core

The city Berlin is the capital of Germany.

URIhttp://example.org/sample #offset_0_42

Universe of discourse is defined as the words over the alphabet of Unicode characters (Unicode Normal Form C), often called *

Scientific core

The city Berlin is the capital of Germany.

URIhttp://example.org/sample #offset_0_42

Universe of discourse is defined as the words over the alphabet of Unicode characters (Unicode Normal Form C), often called *

http://example.org/sample #offset_34_41

Germany

isString

referenceContext

contextisString

Scientific core

Define the notion of Context and formalize it in OWL:Context is similar to the German word Betrachtungshorizont

In English maybe inside context, i.e. the text itself, which serves as a reference context for all included substrings.

Definitely disjoint with groupings such as Document, because a wider context is needed for this.

Example following...

Scientific core

Scientific core

Define the notion of Context and formalize it in OWL:Context is similar to the German word Betrachtungshorizont

In English maybe inside context, i.e. the text itself, which serves as a reference context for all included substrings.

Definitely disjoint with groupings such as Document, because a wider context is needed for this.

Scientific Core

Goal is to research some of the implications, ...but I might not be able to finish it, completely.In scope:Property contextString is inverse-functional, which means that machines can infer automatically that the same context occurs in different documents.

Show consistency with ambiguity

Define metrics that compare contexts

Formalize the interpretation function

Show interoperability with internal models of all major NLP frameworks

(Partial) compatibility with the WWW and the GGG

Scientific Core

Out of scope:Transition between contexts: Do statements from a smaller context hold in a broader context

Incorporate all layers of NLP (Stack). Limited to POS tags and Entity Recognition

Fill all the question marks in the Venn diagram

Areas

Linguistic Linked Open Data Cloud

Developers study

Areas

Evaluation

Compare to other models in NLP: Size (RDF vs. XML) , performance, expressivity

Is NIF easy to understand and implement? Developers study, release of the specification had quite an impact, people started to create extensions and use the format. 50 people on the mailing list.

How to evaluate Web Service integration or consistency with web architecture. If the way strings are represented is transparent and formalized, do I need to do experimental evaluation to show benefits?

Q & A

Thank you for your attention

Standing on the shoulders of giants

BIS 2012/03/01 Leipzig Page

http://lod2.eu

LOD2 Title . 02.09.2010 . Page

http://lod2.eu

http://lod2.eu

ISSLOD 2011/09/15 Page

http://lod2.eu

Table of Contents

LOD2 Title . 02.09.2010 . Page

http://lod2.eu

LOD2 Title . 02.09.2010 . Page

http://lod2.eu

Address

University of LeipzigFaculty of Mathematics and Computer ScienceInstitute of Computer ScienceDepartment of Business Information SystemsPostfach 10092004009 LeipzigGermany

Thanks for your attention!

Contact