1
Funded under: FP7 Area: Language Technologies (ICT-2011.4.2) Project reference: 288342 Coordinator: Marko Grobelnik www.xlike.org INTERFACE Aljaž Košmerlj, Jenya Belyaeva, Gregor Leban, Blaž Fortuna, Marko Grobelnik Artificial Intelligence Laboratory, Jožef Stefan Institute, Ljubljana, Slovenia We present a system for manually extracting structured event information from freeform newswire text. The extraction is performed on news articles preprocessed by services developed within the XLike project and is guided by suggestions the system produces using machine learning techniques. Results of testing performed using human annotators show the system can produce meaningful data and suggest several avenues for improvement of the system. List of articles about the event and a list of entities (i.e. noun phrases) found in the articles. Type of event described in the articles defined by user or selected from suggestions. List of filled and unfilled roles defined by users for the selected event type. Entity role selection using a dropdown list – either in text or in entity list. ABSTRACT INPUT Sets of articles about the same event from the Event Registry service (http://eventregistry.org). PIPELINE EVENT TYPE SUGGESTION EVALUATION suggestions generated by SVM classifier built using the QMiner data analytics platform (http://qminer.ijs.si) 11 annotators annotating the same 10 events 12.1% ± 3.1% of proposed entities annotated per event 6.2 ± 0.9 roles filled per event average pairwise event type agreement: 5.9 ± 2.0 average pairwise Jaccard index of roles with same annotation: 0.25 ± 0.09 average number of successful event type suggestions per user: 6.6 ± 1.9 built on dataset of 100 events annotated into 5 event types (road accident, product launch, protest, earthquake and bombing) by an expert annotator features include concepts found in event by Event Registry as well as bag-of-words features computed on article titles and event summary leave-one-out testing classification accuracy score of: CA = 0.67

NewsKDD 2014: Crowdsourcing event extraction (poster)

Embed Size (px)

DESCRIPTION

Poster for our extended abstract presented at NewsKDD workshop at KDD 2014 conference and ESWC Summer School 2014 where it won 3rd place for best student poster.

Citation preview

Page 1: NewsKDD 2014: Crowdsourcing event extraction (poster)

Funded under: FP7Area: Language Technologies (ICT-2011.4.2)Project reference: 288342Coordinator: Marko Grobelnik

www.xlike.org

INTERFACE

Aljaž Košmerlj, Jenya Belyaeva, Gregor Leban, Blaž Fortuna, Marko Grobelnik

Artificial Intelligence Laboratory, Jožef Stefan Institute, Ljubljana, Slovenia

We present a system for manually extracting structured event information from freeform newswire text. The extraction isperformed on news articles preprocessed by services developed within the XLike project and is guided by suggestions thesystem produces using machine learning techniques. Results of testing performed using human annotators show thesystem can produce meaningful data and suggest several avenues for improvement of the system.

List of articles about the event and alist of entities (i.e. noun phrases)found in the articles.

Type of event described in thearticles defined by user or selectedfrom suggestions.

List of filled and unfilled rolesdefined by users for the selectedevent type.

Entity role selection using adropdown list – either in text or inentity list.

ABSTRACT

INPUTSets of articles about the sameevent from the Event Registryservice (http://eventregistry.org).

PIPELINE EVENT TYPE SUGGESTION EVALUATION

suggestions generated by SVM classifierbuilt using the QMiner data analyticsplatform (http://qminer.ijs.si)

11 annotators annotating thesame 10 events

12.1% ± 3.1% of proposedentities annotated per event

6.2 ± 0.9 roles filled per event

average pairwise event typeagreement: 5.9 ± 2.0

average pairwise Jaccard indexof roles with same annotation:0.25 ± 0.09

average number of successfulevent type suggestions peruser: 6.6 ± 1.9

built on dataset of 100 events annotatedinto 5 event types (road accident, productlaunch, protest, earthquake and bombing)by an expert annotator

features include concepts found in eventby Event Registry as well as bag-of-wordsfeatures computed on article titles andevent summary

leave-one-out testing classificationaccuracy score of: CA = 0.67