Upload
pikakshi-manchanda
View
62
Download
0
Embed Size (px)
Citation preview
PIKAKSHI MANCHANDA
DISCo, University of Milano-Bicocca, Milan, Italy
@pikakshi787
Manchanda et al., Leveraging Entity Linking for Entity Recognition in Microposts
KDIR 2015
KDIR 2015, Lisbon,12th November, 2015
People communicate and share information increasingly through social media platforms
Fresh information emerging in real-time on social media platforms primarily
New entities (newly emerging, newly relevant/popular)
New relationships
Factual information
Events
2
SOCIAL MEDIA: ENTITIES-EMOJIS-EVENTS
Manchanda et al., Leveraging Entity Linking for Entity Recognition in Microposts
KDIR 2015
WHY INFORMATION EXTRACTION??
3
Existing
entities
New entity (Product Launch)
Apple Watch
Product
IBM OS2
Product
Apple
Company
New
Relations
WHY SOCIAL MEDIA PLATFORMS??
Fresh
Real-time info
Incomplete KBs
Unstructured
Web
Manchanda et al., Leveraging Entity Linking for Entity Recognition in Microposts
KDIR 2015
MOTIVATION
4 Manchanda et al., Leveraging Entity Linking for Entity Recognition in Microposts
KDIR 2015
Bridging the gap between Unstructured Web and Web of Data
• Intrinsic incompleteness in KBs
Information Extraction from social media streams (microposts,..)
• Named Entity Recognition (NER)
• Named Entity Classification
• Named Entity Linking (NEL)
Knowledge Base (KB) enrichment
• Identify new knowledge
• Improve NER
• Lexically enriching knowledge bases for existing & new entities
INFORMATION EXTRACTION
Named Entity Recognition: Task of identifying named entities in a piece of text
Named Entities: text fragments that refer to entities in the real world (proper nouns..)
Named Entity Classification: Classifying recognized named entities into entity types such as PERSON, LOCATION, ORGANIZATION…
Named Entity Linking: Linking the identified named entities to resources in a knowledge base (such as Wikipedia, DBpedia)
5 Manchanda et al., Leveraging Entity Linking for Entity Recognition in Microposts
KDIR 2015
6
The Town might be one of the best movies I have seen all year. So,
so good. And don't worry Ben, we already forgave you for Gigli.
Really.
http://dbpedia.org/page/Ben_Affleck
foaf:Person
yago:AmericanFilmActors
http://dbpedia.org/page/Gigli
dbo:Film
yago:AmericanFilms http://es.dbpedia.org/page/The_Town
dbpedia-owl:Film
schema.org/Movie
Manchanda et al., Leveraging Entity Linking for Entity Recognition in Microposts
KDIR 2015
Nam
ed E
ntity
Lin
kin
g
INFORMATION EXTRACTION
7
The Town might be one of the best movies I have seen all year. So,
so good. And don't worry Ben, we already forgave you for Gigli.
Really.
http://dbpedia.org/page/Ben_Affleck
foaf:Person
yago:AmericanFilmActors
http://dbpedia.org/page/Gigli
dbo:Film
yago:AmericanFilms
http://live.dbpedia.org/page/The_Town_(2012_TV_series)
dbo:TelevisionShow
http://schema.org/CreativeWork
Manchanda et al., Leveraging Entity Linking for Entity Recognition in Microposts
KDIR 2015
INFORMATION EXTRACTION
Nam
ed E
ntity
Lin
kin
g
Entity Recognition and Linking in microposts has been reported to be quite challenging:
1. Short and noisy nature, typographic errors, shortening of words, ambiguity, polysemy (Liu et al. 2013, Ritter et al. 2011, Meij et al. 2012)
2. Out Of Vocabulary (OOV) entity mention identification problem
The Big Bang Theory being referred as TBBT
3. Out of Knowledge base (OOKB) entity problem
A new upcoming company Widro
8
CHALLENGES
Manchanda et al., Leveraging Entity Linking for Entity Recognition in Microposts
KDIR 2015
9
Systems/Tools Approach Domain Entity Types/Classes Taxonomy
ANNIE Gazetteers & FSM Newswire 7 (adapted) MUC
Stanford NER CRF Newswire 4, 3 or 7 CoNLL, ACE
Alchemy API Machine Learning Unspecified 324 Alchemy
NERD-ML KNN & Naïve
Bayes
Twitter 4 NERD
TextRazor Machine Learning Unspecified 1779 DBpedia, Freebase
Ritter et al., 2011 CRF Twitter 3 or 10 CoNLL, ACE
Liu et al. 2011 KNN & CRF Twitter 4 CoNLL, ACE
Kalina et al, 2013 Gazetteers & FSM Twitter 3 or 10 CoNLL
Derczynski et al, 2015 Structured
Learning (CRF)
Twitter 10 Freebase
ENTITY RECOGNITION
Manchanda et al., Leveraging Entity Linking for Entity Recognition in Microposts
KDIR 2015
10
Tools Taxonomy Approach/ Features used Domain
DBpedia Spotlight
(Mendes et al., 2011)
DBpedia, Freebase,
Schema.org
Gazetteers and Similarity Metrics Unspecified
TAGME (Ferragina and
Scaiella, 2010)
Wikipedia Wikipedia anchor texts and the
pages linked to those anchor texts
Short texts
YODIE (Damljanovic and
Bontcheva, 2012)
DBpedia Similarity metrics and URI frequency Twitter
Babelfy (Moro et al., 2014) BabelNet semantic
network
Graph-based approach, semantic
signatures
Short text
Meij et al., 2012 Wikipedia n-gram features, concept features,
and tweet features
S-MART, Yang et al, 2015 Wikipedia
Structural Learning (Tree-based) Twitter
Weasel (Tristram et al,
2015)
DBpedia
Machine Learning (using SVM) Newspaper
Articles
Guo et al., 2013 Wikipedia Structural SVM Twitter
Yamada et al., 2015 Wikipedia Supervised
(String matching, n-grams)
Mention detection
& disambiguation
system: Pipeline
Use NEL to learn
how to perform
NER: pipeline
Manchanda et al., Leveraging Entity Linking for Entity Recognition in Microposts
KDIR 2015
ENTITY LINKING
THE PROPOSED SYSTEM An end-to-end IE framework for microblogs to orchestrate NER and NEL
• Entity Recognition and Classification
• Candidate match retrieval for identified entities
• Entity linking
• Leverage entity linking to improve named entity classification
Gold-standard corpus of ~2400 tweets (Ritter et al., 2011)
Ground Truth: Manually curated set of 1616 named entities identified with entity types
Use of DBpedia as an external KB
11 Manchanda et al., Leveraging Entity Linking for Entity Recognition in Microposts
KDIR 2015
12
FRAMEWORK
Manchanda et al., Leveraging Entity Linking for Entity Recognition in Microposts
KDIR 2015
Named Entity
Recognition Tweet Surface forms of
named entities
Index
(rdfs:label)
Entity Search
Top-k labels for each
surface form
Resource
description f
Surface form,
entity type &
context
KB
Entity Disambiguation
Resource
for
each
label
Entity Linking
Improvement of NER
Resource
for surface
form
13
ENTITY RECOGNITION
Manchanda et al., Leveraging Entity Linking for Entity Recognition in Microposts
KDIR 2015
Named Entity
Recognition Tweet Surface forms of
named entities
Index
(rdfs:label)
Entity Search
Top-k labels for each
surface form
Resource
description f
Surface form,
entity type &
context
KB
Entity Disambiguation
Resource
for
each
label
Entity Linking
Improvement of NER
Resource
for surface
form
T-NER grounded on Conditional Random Fields (Sutton and McCallum, 2006)
Classifying each entity e into one or more entity type/class c with a probability score PCRF(e,c)
Experimental Analysis: Entity Recognition
NER Systems: T-NER (Ritter et al. 2011)
14
Entity type: O Entity type:
Geo-Loc
Entity type:
Band Entity type:
Sportsteam
Identification Errors
“@vogueglamGIRL Ah I know! She is simply the best in The Sept Issue. My boyfriend’s aunt worked for Anna
Wintor in NY”
Classification Errors
“Cant wait for the ravens game tomorrow....go ray rice!!!!!!!”
Manchanda et al., Leveraging Entity Linking for Entity Recognition in Microposts
KDIR 2015
PCRF (e, c) = exp (Σ wkfk (e, c)) k=1
K
Text
Phrase
Classification
Level
Example Classification
(%)
Entity Entity Type
Entities
(1496)
Correctly
Classified
Justin Bieber Person 61.57
Incorrectly
Classified
Chicago Person 37.96
Segmentation
Error
Alpha, Omega
(Alpha-Omega)
Geo-Location,
Band
0.47
Non-
Entities
(44k)
Correctly
Classified
It Outside (O) 99.8
Incorrectly
Classified
justthen Person 0.2
T-NER Classification Performance
15
Identifies 1496 named entities from the GS, in contrast to 1616 entities in ground
truth.
8% of entities are not even recognized and thus classified as non-entities (amongst other
44k tokens)
Entity Type Error (%)
Band 73.83
Company 21.9
Facility 54.79
Geo-Location 19.75
Movie 75.83
Other 46.29
Person 28.18
Product 39.70
Sportsteam 48.27
TVshow 48.71
Classification Error Rate: T-NER
Manchanda et al., Leveraging Entity Linking for Entity Recognition in Microposts
KDIR 2015
16
ENTITY LINKING
Manchanda et al., Leveraging Entity Linking for Entity Recognition in Microposts
KDIR 2015
Surface forms of
named entities
Index
(rdfs:label)
Entity Search
Top-k labels for each
surface form
Resource
description f
Surface form,
entity type &
context
KB
Entity Disambiguation
Resource
for
each
label
DBpedia Titles files and NLP resources available at: http://wiki.dbpedia.org/Downloads2015-04
Entity Linking
Named Entity
Recognition Tweet
Improvement of NER
Resource
for surface
form
17
Classifiable
Named
Entity
Linking Level Example Linking
(%)
Entity DBpedia
Type
Linkable Correctly
Linked
Wisconsin Geo-
Location
63.11
Incorrectly
Linked
America Movie 3.05
Uninformative N.J. Thing 16.15
Non-
Linkable
Uninformative Secrets Thing 11.85
Generic Whitney Other 5.83
A total of 1442 entities out of 1496 entities are
disambiguated with ~4k candidate KB resources
Entity Linking-Performance Analysis
Matching function, PKB (e, rc), to detect the resource for a
surface form of named entity in KB, if it exists:
1. Lexical Similarity, lex(e, lrc)
2. Coherence, coh(e+, drc)
Manchanda et al., Leveraging Entity Linking for Entity Recognition in Microposts
KDIR 2015
Experimental Analysis: Entity Linking
⇒ PKB (e, rc) = *(lex(e, lrc)) + (1- )*(coh(e+, drc))
( currently set to 0.5)
18
ENTITY RECOGNITION ENHANCEMENT
Manchanda et al., Leveraging Entity Linking for Entity Recognition in Microposts
KDIR 2015
Surface forms of
named entities
Index
(rdfs:label)
Entity Search
Top-k labels for each
surface form
Resource
description f
Surface form,
entity type &
context
KB
Entity Disambiguation
Resource
for
each
label
Entity Linking
Named Entity
Recognition Tweet
T-NER+
Resource
for surface
form
c*e = argmax {PCRF (e, c)*PKB (e, rc)}
c
T-NER Performance
Analysis T-NER+ Performance
Analysis
Entity Type Precision Recall F1 Precision Recall F1
Band 0.26 0.88 0.40 0.39 0.90 0.54
Company 0.78 0.90 0.84 0.81 0.90 0.85
Facility 0.45 0.72 0.55 0.50 0.72 0.59
Geo-Location 0.80 0.95 0.87 0.80 0.95 0.87
Movie 0.24 0.88 0.38 0.34 0.88 0.49
Other 0.57 0.70 0.63 0.56 0.76 0.64
Person 0.72 0.92 0.81 0.77 0.92 0.84
Product 0.60 0.69 0.65 0.63 0.71 0.67
Sportsteam 0.52 0.83 0.64 0.63 0.85 0.72
TVshow 0.51 0.91 0.66 0.45 0.89 0.59
Overall 0.62 0.87 0.73 0.66 0.88 0.76
Comparative Analysis: T-NER and T-NER+
19 Manchanda et al., Leveraging Entity Linking for Entity Recognition in Microposts
KDIR 2015
Experimental Analysis: Entity Recognition Enhancement Entity Ground-Truth T-NER T-NER+
30stm Band Product Band
Yahoo Company Band Company
Southgate
House
Facility Band Facility
Canada Geo-Location Person Geo-Location
Camp rock 2 Movie Person Movie
Thanksgiving Other Person Other
John Acuff Person Facility Person
iphone Product Company Product
Lions Sportsteam Person Sportsteam
TMZ TVshow Band TVshow
Example: Re-classification of entities
Precision (P) = |{cor.cl} ∩ {cl}|
|{cl}|
Recall (R) = |{cor.cl} ∩ {cl}|
|{cor. cl}|
F1 Measure = 2 x P x R
P+R
cor.cl denotes correctly classified entities,
while cl denotes classified entities.
20
New knowledge emerges constantly on social media streams
Its important to identify new knowledge in order to bridge the gap
between Unstructured Web and Web of Data
An end-to-end entity linking pipeline might be helpful for
detecting new knowledge
Entity linking can be used to improve classification performance
of an entity recognition system
Improving entity recognition is crucial for identifying new entities
Manchanda et al., Leveraging Entity Linking for Entity Recognition in Microposts
KDIR 2015
21
Presented an end-to-end entity linking pipeline for short textual formats (microposts)
Presented an approach for improving entity recognition through re-classification
Marginal improvements observed in re-classification using linked entities
A definite scope for improving the current system
New knowledge has been identified, though not dealt with currently
Quality assessment, trustworthy factors…
Relation extraction from microposts to improve identification of new knowledge
Experimenting with more recent datasets
Manchanda et al., Leveraging Entity Linking for Entity Recognition in Microposts
KDIR 2015