14
GENERATING NEWSCASTS SEMANTIC SNAPSHOTS USING ENTITY EXPANSION JOSÉ LUIS REDONDO GARCIA GIUSEPPE RIZZO LILIA PÉREZ ROMERO MICHIEL HILDEBRAND RAPHAËL TRONCY @peputo / [email protected] @giusepperizzo / [email protected] [email protected] @McHildebrand / [email protected] @rtroncy / [email protected]

(Linked Data Development and Exploitation track) "Generating the Semantic Snapshot of Newscasts Using Entity Expansion" - José Luis Redondo-García, Giuseppe Rizzo, Lilia Pérez Romero,

Embed Size (px)

Citation preview

GENERATING NEWSCASTS SEMANTIC SNAPSHOTS USING ENTITY EXPANSION

JOSÉ LUIS REDONDO GARCIA GIUSEPPE RIZZO LILIA PÉREZ ROMERO MICHIEL HILDEBRAND RAPHAËL TRONCY

@peputo / [email protected] @giusepperizzo / [email protected]

[email protected] @McHildebrand / [email protected]

@rtroncy / [email protected]

NEWS CONSUMPTION SEMANTIC SNAPSHOT (NSS)

Named Entity Expansion

News item

2

News Semantic Snapshot (NSS)

Snowden asks Russia for asylum

15th International Conference on Web Engineering (ICWE) June 24, 2015

NEWS ENTITY EXPANSION

NSS

June 24, 2015 3

(20) (1) (4) (4) Web-based, Unsupervised, Sequential

15th International Conference on Web Engineering (ICWE)

Involving: (experts in the news domain + users) Dimensions: Play with the data and help us to extend it at: https://github.com/jluisred/NewsConceptExpansion/wiki/Golden-Standard-Creation

EVALUATION: NEWS ENTITIES GOLD STANDARD

(1) Video Subtitles (2) Image in the video (3) Text in the video image (4) Suggestions of an expert (5) Related articles

4 June 24, 2015 15th International Conference on Web Engineering (ICWE)

DOCUMENT COLLECTION

(20 variations)

Using Google Custom Search Engine (CSE)1

[1] https://cse.google.com/cse/all

June 24, 2015 5

N

… N N N N N

N N N N N N N N N N

N N N

Web sites to be crawled: -  Google:

-  L1 : A set of 10 internationals English speaking newspapers

-  L2 : A set of 3 international newspapers used in GS

Temporal Window: -  1W:

-  2W:

Annotation filtering:

15th International Conference on Web Engineering (ICWE)

DOCUMENT ANNOTATION

NER extractors in NERD *

(*) Benchmarking the Extraction and Disambiguation of Named Entities on the Semantic Web, Rizzo et al. (2004)

6 June 24, 2015 15th International Conference on Web Engineering (ICWE)

ENTITY FILTERING (4 variations)

Filtering dimensions: -  F1: NERD type:

-  Person -  Organization -  Location

-  F2: Confidence score: > Threshold

-  F3: Capitalization: country president Obama asylum

June 24, 2015 7 15th International Conference on Web Engineering (ICWE)

RANKING STRATEGIES (1)

increase representativeness è leverage on entity frequency

June 24, 2015 8

(Freq) (Gaussian) 15th International Conference on Web Engineering (ICWE)

RANKING STRATEGIES (2)

Rules: [ Sel(e) , ]

POPULARITY EXPERT RULES

9

-  Based on Google Trends -  w = 2 months -  µ + 2*σ (2.5%) -  .

Example: -  [ Location, = 0.48 ] -  [ Person, = 0.74 ] -  [ Organization, = 0.95 ] -  [ < 2 , = 0.0 ]

(4 variations)

June 24, 2015 15th International Conference on Web Engineering (ICWE) 9

EVALUATION: MEASURES

Mean P/R at N: -  Most popular -  Easy to interpret

Mean Average Precision at N (MAP): -  Considers ranking -  Relevant documents at the top positions Mean Normalized Discounted Cumulative Gain at N (MNDCG): -  Different levels of document relevance -  The lower an high relevant document is ranked, the less useful

is for the user N = 10

June 24, 2015 10 15th International Conference on Web Engineering (ICWE)

RESULTS (1) Baselines: BS1: Former Entity Expansion Implementation* •  Google •  No temporal window •  No_Schema.org •  No_Filter • 

BS2: TFIDF-based Function.

June 24, 2015 11 15th International Conference on Web Engineering (ICWE)

(*) Describing and Contextualizing Events in TV News Show, Redondo et

al. (2014)

RE

SU

LTS

(2)

12

20 x 4 x 4 =

320 runs

F3

Freq + POP + EXP

Google + 2W + Schema.org

12

CONCLUSIONS & FUTURE WORK -  News Entity Expansion è Generate the News

Semantic Snapshot -  Best score: 0.666 in MNDCG at 10, better than BS1/2

•  Collection: CSE (Google + 2W + Schema.org) •  Filtering: F3 •  Ranking: Freq + POP + EXP

What’s next: -  Extend the Ground Truth -  Supervised approach -  Better exploit semantic connections between entities in KB -  Is MNDCG@10 an ideal indicator for assessing NSS quality?

June 24, 2015 13 15th International Conference on Web Engineering (ICWE)

JOSÉ LUIS REDONDO GARCIA GIUSEPPE RIZZO LILIA PÉREZ ROMERO MICHIEL HILDEBRAND RAPHAËL TRONCY

@peputo / [email protected] @giusepperizzo / [email protected]

[email protected] @McHildebrand / [email protected]

@rtroncy / [email protected]

http://www.slideshare.net/joseluisredondo/newssemantic