12
Entity Enrichment and Consolidation in ARCOMEM Elena Demidova 1 , including slides by: Stefan Dietze 1 , Diana Maynard 2 , Thomas Risse 1 , Wim Peters 2 , Katerina Doka 3 , Yannis Stavrakas 3 1 L3S Research Center, Hannover, Germany 2 University Sheffield, UK 3 IMIS, RC ATHENA, Athens, Greece

Arcomem training enrichment_beginner

  • Upload
    arcomem

  • View
    161

  • Download
    2

Embed Size (px)

DESCRIPTION

This presentation on data enrichment is part of the ARCOMEM training curriculum. Feel free to roam around or contact us on Twitter via @arcomem to learn more about ARCOMEM training on archiving Social Media.

Citation preview

Page 1: Arcomem training enrichment_beginner

Entity Enrichment and Consolidationin ARCOMEM

Elena Demidova1,

including slides by: Stefan Dietze1, Diana Maynard2, Thomas Risse1, Wim Peters2, Katerina Doka3, Yannis Stavrakas3

1 L3S Research Center, Hannover, Germany2 University Sheffield, UK3 IMIS, RC ATHENA, Athens, Greece

Page 2: Arcomem training enrichment_beginner

The ARCOMEM approach

• Make use of the Social Web– Huge source of user generated content– Wide range of articulation methods

From simple „I like it“-Buttons to complete articles– Represents the diversity of opinions of the public

• User activities often triggered by – Events and related entities

(e.g. Sport Events, Celebrations, Crises, News Articles, Persons, Locations)

– Topics (e.g. Global Warming, Financial Crisis, Swine Flu)

A semantic-aware and socially-driven preservation model is a natural way to go

Slide 2

Page 3: Arcomem training enrichment_beginner

The extraction components for text

Aim Extraction of Entities, Topics, Events and Opinions (ETOEs) from

Web Pages Social Web (Twitter, YouTube, Facebook, …)

Challenges Entity recognition from degraded input sources (tweets etc)

Advancing state of the art NLP and text mining Dynamics detection: evolution of terms/entities

Semantic representation of Web objects and entities Appropriate RDF schemas for ETOE and Web objects Exploiting (Linked Open) Web data to enrich extracted ETOE

Entity classification (into events, locations, topics etc) & consolidation

Slide 3

Page 4: Arcomem training enrichment_beginner

ETOE extraction with GATE: an example

Slide 4

candidate multi-word term

Page 5: Arcomem training enrichment_beginner

Data consolidation & integration problem

Data extracted from different components or during different processing cycles not aligned => consolidation, disambiguation & correlation required.

Slide 5

<Location>Greece</Location><Person>Venizelos</Person> <Location>Griechenland</Location>

<Organisation>Greek Parliament</Organisation>

?

Page 6: Arcomem training enrichment_beginner

Data clustering & enrichmentEnrichment of entities with related references to Linked Data, particularly reference datasets (DBpedia, Freebase, …)=> use enrichments for correlation/clustering/consolidation

Slide 6

Page 7: Arcomem training enrichment_beginner

<Event>Trichet warns of systemic debt crisis</Event>

<Person>Jean Claude Trichet</Person> <Organisation>ECB</Organisation>

Enrichment for clustering & correlation: example

Slide 7

Page 8: Arcomem training enrichment_beginner

<Enrichment>http://dbpedia.org/resource/Jean-Claude_Trichet</Enrichment>

<Enrichment>http://dbpedia.org/resource/ECB</Enrichment>

<Event>Trichet warns of systemic debt crisis</Event>

<Person>Jean Claude Trichet</Person> <Organisation>ECB</Organisation>

Enrichment for clustering & correlation: example

Slide 8

Page 9: Arcomem training enrichment_beginner

=> dbpprop:office dbpedia:President_of_the_European_Central_Bankdbpedia:Governor_of_the_Banque_de_France

=> dcterms:subject category:Living_peoplecategory:Karlspreis_recipientscategory:Alumni_of_the_École_Nationale_d'Administrationcategory:People_from_Lyon…

<Enrichment>http://dbpedia.org/resource/Jean-Claude_Trichet</Enrichment>

<Enrichment>http://dbpedia.org/resource/ECB</Enrichment>

<Event>Trichet warns of systemic debt crisis</Event>

<Person>Jean Claude Trichet</Person> <Organisation>ECB</Organisation>

Enrichment for clustering & correlation: example

Slide 9

Page 10: Arcomem training enrichment_beginner

ARCOMEM entities and enrichments - graph

Slide 10

Nodes: entities/events (blue), enrichments DBpedia (green), Freebase (orange)

1013 clusters of correlated entities/events

Page 11: Arcomem training enrichment_beginner

Nodes: entities/events (blue), enrichments DBpedia (green), Freebase (orange)

1013 clusters of correlated entities/events => cluster expansion by considering related enrichments

ARCOMEM entities and enrichments - graph

Slide 11

Page 12: Arcomem training enrichment_beginner

THANK YOUCONTACT DETAILS

Dr. Elena DemidovaL3S Research Center+49 511 762 17732

[email protected]