Using Suffix Arrays for Efficient Recognition of Named Entities in Large Scale

Preview:

DESCRIPTION

 

Citation preview

Benjamin Adrian, Sven Schwarzhttp://www.dfki.de/~lastname

Using Suffix Arrays for Efficient Recognition of Named Entities

in Large Scale

Benjamin Adrian,Sven Schwarz

2Benjamin Adrian, Sven Schwarzhttp://www.dfki.de/~lastname

A huge Web of Data

The Semantic Web offerstechniques for ...

● representing,● formalizing,● and reasoning information

… on the WWW in order to make information ...

● transferable,● portable, ● and interpretable

… for machine consumption.∑ 9,363,625 distinct literal values

3Benjamin Adrian, Sven Schwarzhttp://www.dfki.de/~lastname

Wouldn't it be great to … ?

… to link entity references in text to referents in RDF graphs.

Goal: Enrich natural language text with formal facts.

Benjamin works at DFKI, Kaiserslautern.

4Benjamin Adrian, Sven Schwarzhttp://www.dfki.de/~lastname

natural language text

How to recognize entity references ?

→ application of relational databases and suffix arrays

efficient representation RDF source

Benjamin works at DFKI, Kaiserslautern.

5Benjamin Adrian, Sven Schwarzhttp://www.dfki.de/~lastname

Entity Recognition Process

text suffix array database RDF graph

query

candidates withmatching prefixes

hashes

prefixhashing

noun-phrasechunking

exact matches

exact match

6Benjamin Adrian, Sven Schwarzhttp://www.dfki.de/~lastname

RDF statements

<#19810211> <rdfs:label> “Benjamin Adrian”<#67478302> <rdfs:label> “DFKI”

<#19810211> <#employedAt> <#67478302>

symbols

relation

7Benjamin Adrian, Sven Schwarzhttp://www.dfki.de/~lastname

Represent RDF data

RESOURCE INDEX

URI INDEX

RELATIONS

SUBJECT PREDICATE OBJECT

SYMBOLS

SUBJECT PREDICATE OBJECT

LITERAL INDEX

LITERALINDEX HASH

sepatarate storage of symbols and relations

dictionaries

8Benjamin Adrian, Sven Schwarzhttp://www.dfki.de/~lastname

Suffix Array

“Benjamin Adrian works in DFKI, Kaiserslautern”

Adrian works in DFKI, KaiserslauternBenjamin Adrian works in DFKI, KaiserslauternDFKI, Kaiserslauternin DFKI, KaiserslauternKaiserslauternworks in DFKI, Kaiserslautern

Text

Suffix array (sorted list of suffixes)

9Benjamin Adrian, Sven Schwarzhttp://www.dfki.de/~lastname

Suffix Array

Benjamin AdrianDFKIKaiserslautern

Adrian works in DFKI, KaiserslauternBenjamin Adrian works in DFKI, KaiserslauternDFKI, KaiserslauternKaiserslautern

“Benjamin Adrian works in DFKI, Kaiserslautern”

Adrian works in DFKI, KaiserslauternBenjamin Adrian works in DFKI, KaiserslauternDFKI, Kaiserslauternin DFKI, KaiserslauternKaiserslauternworks in DFKI, Kaiserslautern

Text

Suffix array (sorted list of suffixes)

Phrases in text Reduced suffix array

10Benjamin Adrian, Sven Schwarzhttp://www.dfki.de/~lastname

Noun phrases in natural language text

11Benjamin Adrian, Sven Schwarzhttp://www.dfki.de/~lastname

Hashing prefixes

LITERAL INDEX

LITERALINDEX HASHAdrian works in DFKI, KaiserslauternBenjamin Adrian works in DFKI, KaiserslauternDFKI, KaiserslauternKaiserslautern

Suffix array (hashed prefix size = 4)

12Benjamin Adrian, Sven Schwarzhttp://www.dfki.de/~lastname

Select candidates from database

13Benjamin Adrian, Sven Schwarzhttp://www.dfki.de/~lastname

Response time

14Benjamin Adrian, Sven Schwarzhttp://www.dfki.de/~lastname

Summary

text suffix array database RDF graph

query

candidates withmatching prefixes

hashes

prefixhashing

noun-phrasechunking

exact matches

exact match

15Benjamin Adrian, Sven Schwarzhttp://www.dfki.de/~lastname

Thank you

Questions?

Benjamin Adrian

Sven Schwarz

Recommended