Upload
roni-wiener
View
38
Download
0
Embed Size (px)
Citation preview
© 2016 Relegence.
Roni Wiener
AOL’s Named Entity Resolvervia strongly connected components and ad-hoc edge construction
© 2016 Relegence.
What is News Tagging?
2
Categories - extract what an article is about.Entities - extract who and what is in the article.
Will Smith
New Orleans
NFL
New Orleans Saints
Atlanta Falcons
...
Entities Categories
U.S News
Crime
Football
Murder
Celebrities
...
Easy for humans, hard for machinesTry to mimic humans than - search for
hitshttp://www.sbnation.com/2016/4/28/11518540/will-smith-murder-case-new-orleans-saints-trial
© 2016 Relegence. 3
Outline
What is news Documents Tagging?
Entity Ambiguity and Unknown Entities
Documents Graph
Graph based Ambiguity Resolution
© 2016 Relegence.
So, What is the Big Deal?
4
We use Machine Learning for Categories and Keywords for Entities mentions
Machine Learning
Keywords Search
ARTICLE
Entities From Knowledge Base
Categories
Well … we are not done yet, need to handle AMBIGUITY and UNKNOWN ENTITIES ...
© 2016 Relegence.
What is Entities Ambiguity and Unknown Entities
5
200 ‘Will Smith’s are in our Knowledge Base
Keyword – Will Smith
We must consider an Unknown Entity for each
keywordTotaling 201 possibilities
to choose from
Most of the ‘Will Smith’s in the world are not in our Knowledge Base
404
© 2016 Relegence.
OK, This May Be Tricky
6
For the given “Will Smith” article, we have:
Keywordsoptions = (Will Smith)201 X (Saints)31 X (Georgia)26 ….
Totaling:
263,006,617,337,856,000,000Entities Combinations!!
BUT ONLY 1 COMBINATION IS CORRECT
© 2016 Relegence.
Wow, How Can We Solve This Mess?
7
PruningImprobable Entities
Grouping Solve ambiguity between small number of entities
groups
Both grouping and pruning are done by looking for contextual hints left by the article’s author
Build a Document Graph with the aid of a Knowledge Graph. It allows us to do:
© 2016 Relegence.
We have a lot of prior knowledge represented as a graph, it contains Entities, Categories, Text and relations between them. The knowledge graph can be thought as the article’s author and readers’ prior shared knowledge, where an article contextual hints are based on it.
What is a Knowledge Graph … Wikipedia ++
8
Entity
Text
Category
“First Lady”
“President”
Michelle Obama
Barack Obama
Politics
© 2016 Relegence.
It is derived from the Knowledge Graph with additional temporal knowledge.First hints are all possible entity candidates and their relations derived from the knowledge graph.
Building the Document Graph - Entities
9
New Orleans Saints(Football team)
Atlanta Falcons(Football team)
Will Smith(Footballer)
NFL(Football League)
New Orleans (City)
w=1.0
w=1.0
w=1.0
w=1.0w=1.0
Will Smith(Actor)
22 Jump Street(Movie)
Facebook(Company)
© 2016 Relegence.
Next hints are the Document categories derived for our classifier.
Building the Document Graph - Categories
10
New Orleans Saints(Football team)
Atlanta Falcons(Football team)
Will Smith(Footballer)
NFL(Football League)
New Orleans (City)
w=1.0
w=1.0
w=1.0
w=1.0w=1.0
Will Smith(Actor)22 Jump Street
(Movie)
Facebook(Company)
Celebrities(Category)
Football(Category)
w=1.0
w=1.0
© 2016 Relegence.
Heuristic Nodes and Edges
11
Heuristics helps us prune improbable Entities, for example, highly frequent terms, common names and single words, as they are not sound contextual hints.
Will Smith
W < 0
Common Names
Single word
Frequent word
Heuristic Nodes
© 2016 Relegence.
Knowledge nodes that are highly connected to the Document Graph can be assumed as falling under the same context.
Represent this prior knowledge by an edge between relevant document nodes.
Context Nodes and Ad Hoc Edges
12
Document Graph
Knowledge Graph
New Orleans Saints(Football team)
Atlanta Falcons(Football team)
Footballer C
Footballer A
Footballer B
Context Nodes
© 2016 Relegence.
Knowledge nodes that are highly connected to the Document Graph can be assumed as falling under the same context.
Represent this prior knowledge by an edge between relevant document nodes.
Context Nodes and Ad Hoc Edges
13
Document Graph
Knowledge Graph
New Orleans Saints(Football team)
Atlanta Falcons(Football team)
Footballer C
Footballer A
Footballer B
Context Nodes
Ad hoc edgew=1.0
© 2016 Relegence.
Building the Document Graph - Text
14
“ … Defensive End Will Smith ...”
Footballer
“Defensive End”P(Footballer(w) | Defensive End,w-5,w+5) 1⪝
P(Actor(w) | Defensive End,w-5,w+5) 0⪞
Actor
© 2016 Relegence.
Add Text nodes found in the Document, draw edges by proximity and expectations.
Building the Document Graph - Text
15
New Orleans Saints(Football team)
Atlanta Falcons(Football team)
Will Smith(Footballer)
NFL(Football League)
New Orleans (City)
w=1.0
w=1.0
w=1.0
w=1.0
Will Smith(Actor)22 Jump Street
(Movie)
Facebook(Company)
Celebrities(Category)
Football(Category)
w=1.0
w=1.0
“Defensive End”
w= -0.6
w= 1.0
© 2016 Relegence.
Weight each Entity node by the sum of its edges
Ok, The Graph Is Built, Now What?
16
New Orleans Saints(Football team)
Atlanta Falcons(Football team)
Will Smith(Footballer)
NFL(Football League)
New Orleans (City)
Will Smith(Actor)
w <0
22 Jump Street(Movie)
Facebook(Company)
Celebrities(Category)
Football(Category)
“Defensive End”
Name(Heuristic)
Frequent(Heuristic) Single Word
(Heuristic)
w > 0
w > 0
w = 0w > 0
w > 0
w > 0
w > 0
© 2016 Relegence.
Node Weights Groups
17
Facebook (w) < 0
NFLW > 0
22 Jump StreetW=0
Pruneimprobable at the given document
contextor Unknown Entity
No / Mixed Signalresolve them once
the correct document context is
clearer
Positive NodesLet’s start with
these
© 2016 Relegence.
Divide the positive nodes to 2 sets, edges can still run between the sets
Solved and Unsolved Nodes Sets
18
Unsolved SetHas Ambiguity
Solved Set No Ambiguity
Will Smith
Will Smith
Our goal is to move all Entities from the unsolved set to the solved one
© 2016 Relegence.
Group nodes by finding Strongly Connected Components (SCC) in each set
Group Nodes by Context
19
Revealing contextual regions in the document
Unsolved SetHas Ambiguity
Solved Set No Ambiguity
Will Smith
Will Smith
New Orleans Saints
© 2016 Relegence.
Iteratively Solve on the SCCs Level
20
Iterate:
• Move the best scored SCC from the unsolved set to the solved one• Filter out all losing ambiguous nodes and their edges• Prune negative weight nodes and their edges• Extract SCCs from each set
Unsolved SetHas
Ambiguity
Solved Set No Ambiguity
Will Smith
Will Smith
Stop when the unsolved set is empty - no more ambiguity
Max Score SCC
© 2016 Relegence.
SCC score is the sum of its nodes scores.Node Score is dominated by the sum of its edges’ weights to solved SCC
Node(score) = ∑(Edge(w) * |SCC|)
How to Score SCCs
21
Unsolved SetHas Ambiguity
Solved Set No Ambiguity
Will Smith
Will Smith
|SCC| = 2
|SCC| = 3
© 2016 Relegence.
Revisiting Zero Weight Nodes
22
Based on the correct Entities and context regions, heuristics are applied to resolve these nodes. (Out of our scope)
Most Entities with positive correlation to the context are regarded as correct.
© 2016 Relegence. 23
Summary
Document Graph
Solving on the SCC level and not on the Entity level
Dealing with Unknowne Entities
Shows significant improvements over state of the art products on real life scenarios
For more information, please contact:
© 2016 Relegence.
[email protected] www.relegence.com
Thank You
Roni Wiener
Please try it at:http://www.relegence.com/demo