Upload
icwe2015
View
37
Download
0
Tags:
Embed Size (px)
Citation preview
GENERATING NEWSCASTS SEMANTIC SNAPSHOTS USING ENTITY EXPANSION
JOSÉ LUIS REDONDO GARCIA GIUSEPPE RIZZO LILIA PÉREZ ROMERO MICHIEL HILDEBRAND RAPHAËL TRONCY
@peputo / [email protected] @giusepperizzo / [email protected]
[email protected] @McHildebrand / [email protected]
@rtroncy / [email protected]
NEWS CONSUMPTION SEMANTIC SNAPSHOT (NSS)
Named Entity Expansion
News item
2
News Semantic Snapshot (NSS)
Snowden asks Russia for asylum
15th International Conference on Web Engineering (ICWE) June 24, 2015
NEWS ENTITY EXPANSION
NSS
June 24, 2015 3
(20) (1) (4) (4) Web-based, Unsupervised, Sequential
15th International Conference on Web Engineering (ICWE)
Involving: (experts in the news domain + users) Dimensions: Play with the data and help us to extend it at: https://github.com/jluisred/NewsConceptExpansion/wiki/Golden-Standard-Creation
EVALUATION: NEWS ENTITIES GOLD STANDARD
(1) Video Subtitles (2) Image in the video (3) Text in the video image (4) Suggestions of an expert (5) Related articles
4 June 24, 2015 15th International Conference on Web Engineering (ICWE)
DOCUMENT COLLECTION
(20 variations)
Using Google Custom Search Engine (CSE)1
[1] https://cse.google.com/cse/all
June 24, 2015 5
N
… N N N N N
N N N N N N N N N N
N N N
Web sites to be crawled: - Google:
- L1 : A set of 10 internationals English speaking newspapers
- L2 : A set of 3 international newspapers used in GS
Temporal Window: - 1W:
- 2W:
Annotation filtering:
15th International Conference on Web Engineering (ICWE)
DOCUMENT ANNOTATION
NER extractors in NERD *
(*) Benchmarking the Extraction and Disambiguation of Named Entities on the Semantic Web, Rizzo et al. (2004)
6 June 24, 2015 15th International Conference on Web Engineering (ICWE)
ENTITY FILTERING (4 variations)
Filtering dimensions: - F1: NERD type:
- Person - Organization - Location
- F2: Confidence score: > Threshold
- F3: Capitalization: country president Obama asylum
June 24, 2015 7 15th International Conference on Web Engineering (ICWE)
RANKING STRATEGIES (1)
increase representativeness è leverage on entity frequency
June 24, 2015 8
(Freq) (Gaussian) 15th International Conference on Web Engineering (ICWE)
RANKING STRATEGIES (2)
Rules: [ Sel(e) , ]
POPULARITY EXPERT RULES
9
- Based on Google Trends - w = 2 months - µ + 2*σ (2.5%) - .
Example: - [ Location, = 0.48 ] - [ Person, = 0.74 ] - [ Organization, = 0.95 ] - [ < 2 , = 0.0 ]
(4 variations)
June 24, 2015 15th International Conference on Web Engineering (ICWE) 9
EVALUATION: MEASURES
Mean P/R at N: - Most popular - Easy to interpret
Mean Average Precision at N (MAP): - Considers ranking - Relevant documents at the top positions Mean Normalized Discounted Cumulative Gain at N (MNDCG): - Different levels of document relevance - The lower an high relevant document is ranked, the less useful
is for the user N = 10
June 24, 2015 10 15th International Conference on Web Engineering (ICWE)
RESULTS (1) Baselines: BS1: Former Entity Expansion Implementation* • Google • No temporal window • No_Schema.org • No_Filter •
BS2: TFIDF-based Function.
June 24, 2015 11 15th International Conference on Web Engineering (ICWE)
(*) Describing and Contextualizing Events in TV News Show, Redondo et
al. (2014)
CONCLUSIONS & FUTURE WORK - News Entity Expansion è Generate the News
Semantic Snapshot - Best score: 0.666 in MNDCG at 10, better than BS1/2
• Collection: CSE (Google + 2W + Schema.org) • Filtering: F3 • Ranking: Freq + POP + EXP
What’s next: - Extend the Ground Truth - Supervised approach - Better exploit semantic connections between entities in KB - Is MNDCG@10 an ideal indicator for assessing NSS quality?
June 24, 2015 13 15th International Conference on Web Engineering (ICWE)
JOSÉ LUIS REDONDO GARCIA GIUSEPPE RIZZO LILIA PÉREZ ROMERO MICHIEL HILDEBRAND RAPHAËL TRONCY
@peputo / [email protected] @giusepperizzo / [email protected]
[email protected] @McHildebrand / [email protected]
@rtroncy / [email protected]
http://www.slideshare.net/joseluisredondo/newssemantic