24
© 2016 Relegence. Roni Wiener AOL’s Named Entity Resolver via strongly connected components and ad- hoc edge construction

AOL's Entity Resolver

Embed Size (px)

Citation preview

Page 1: AOL's Entity Resolver

© 2016 Relegence.

Roni Wiener

AOL’s Named Entity Resolvervia strongly connected components and ad-hoc edge construction

Page 2: AOL's Entity Resolver

© 2016 Relegence.

What is News Tagging?

2

Categories - extract what an article is about.Entities - extract who and what is in the article.

Will Smith

New Orleans

NFL

New Orleans Saints

Atlanta Falcons

...

Entities Categories

U.S News

Crime

Football

Murder

Celebrities

...

Easy for humans, hard for machinesTry to mimic humans than - search for

hitshttp://www.sbnation.com/2016/4/28/11518540/will-smith-murder-case-new-orleans-saints-trial

Page 3: AOL's Entity Resolver

© 2016 Relegence. 3

Outline

What is news Documents Tagging?

Entity Ambiguity and Unknown Entities

Documents Graph

Graph based Ambiguity Resolution

Page 4: AOL's Entity Resolver

© 2016 Relegence.

So, What is the Big Deal?

4

We use Machine Learning for Categories and Keywords for Entities mentions

Machine Learning

Keywords Search

ARTICLE

Entities From Knowledge Base

Categories

Well … we are not done yet, need to handle AMBIGUITY and UNKNOWN ENTITIES ...

Page 5: AOL's Entity Resolver

© 2016 Relegence.

What is Entities Ambiguity and Unknown Entities

5

200 ‘Will Smith’s are in our Knowledge Base

Keyword – Will Smith

We must consider an Unknown Entity for each

keywordTotaling 201 possibilities

to choose from

Most of the ‘Will Smith’s in the world are not in our Knowledge Base

404

Page 6: AOL's Entity Resolver

© 2016 Relegence.

OK, This May Be Tricky

6

For the given “Will Smith” article, we have:

Keywordsoptions = (Will Smith)201 X (Saints)31 X (Georgia)26 ….

Totaling:

263,006,617,337,856,000,000Entities Combinations!!

BUT ONLY 1 COMBINATION IS CORRECT

Page 7: AOL's Entity Resolver

© 2016 Relegence.

Wow, How Can We Solve This Mess?

7

PruningImprobable Entities

Grouping Solve ambiguity between small number of entities

groups

Both grouping and pruning are done by looking for contextual hints left by the article’s author

Build a Document Graph with the aid of a Knowledge Graph. It allows us to do:

Page 8: AOL's Entity Resolver

© 2016 Relegence.

We have a lot of prior knowledge represented as a graph, it contains Entities, Categories, Text and relations between them. The knowledge graph can be thought as the article’s author and readers’ prior shared knowledge, where an article contextual hints are based on it.

What is a Knowledge Graph … Wikipedia ++

8

Entity

Text

Category

“First Lady”

“President”

Michelle Obama

Barack Obama

Politics

Page 9: AOL's Entity Resolver

© 2016 Relegence.

It is derived from the Knowledge Graph with additional temporal knowledge.First hints are all possible entity candidates and their relations derived from the knowledge graph.

Building the Document Graph - Entities

9

New Orleans Saints(Football team)

Atlanta Falcons(Football team)

Will Smith(Footballer)

NFL(Football League)

New Orleans (City)

w=1.0

w=1.0

w=1.0

w=1.0w=1.0

Will Smith(Actor)

22 Jump Street(Movie)

Facebook(Company)

Page 10: AOL's Entity Resolver

© 2016 Relegence.

Next hints are the Document categories derived for our classifier.

Building the Document Graph - Categories

10

New Orleans Saints(Football team)

Atlanta Falcons(Football team)

Will Smith(Footballer)

NFL(Football League)

New Orleans (City)

w=1.0

w=1.0

w=1.0

w=1.0w=1.0

Will Smith(Actor)22 Jump Street

(Movie)

Facebook(Company)

Celebrities(Category)

Football(Category)

w=1.0

w=1.0

Page 11: AOL's Entity Resolver

© 2016 Relegence.

Heuristic Nodes and Edges

11

Heuristics helps us prune improbable Entities, for example, highly frequent terms, common names and single words, as they are not sound contextual hints.

Will Smith

Facebook

W < 0

Common Names

Single word

Frequent word

Heuristic Nodes

Page 12: AOL's Entity Resolver

© 2016 Relegence.

Knowledge nodes that are highly connected to the Document Graph can be assumed as falling under the same context.

Represent this prior knowledge by an edge between relevant document nodes.

Context Nodes and Ad Hoc Edges

12

Document Graph

Knowledge Graph

New Orleans Saints(Football team)

Atlanta Falcons(Football team)

Footballer C

Footballer A

Footballer B

Context Nodes

Page 13: AOL's Entity Resolver

© 2016 Relegence.

Knowledge nodes that are highly connected to the Document Graph can be assumed as falling under the same context.

Represent this prior knowledge by an edge between relevant document nodes.

Context Nodes and Ad Hoc Edges

13

Document Graph

Knowledge Graph

New Orleans Saints(Football team)

Atlanta Falcons(Football team)

Footballer C

Footballer A

Footballer B

Context Nodes

Ad hoc edgew=1.0

Page 14: AOL's Entity Resolver

© 2016 Relegence.

Building the Document Graph - Text

14

“ … Defensive End Will Smith ...”

Footballer

“Defensive End”P(Footballer(w) | Defensive End,w-5,w+5) 1⪝

P(Actor(w) | Defensive End,w-5,w+5) 0⪞

Actor

Page 15: AOL's Entity Resolver

© 2016 Relegence.

Add Text nodes found in the Document, draw edges by proximity and expectations.

Building the Document Graph - Text

15

New Orleans Saints(Football team)

Atlanta Falcons(Football team)

Will Smith(Footballer)

NFL(Football League)

New Orleans (City)

w=1.0

w=1.0

w=1.0

w=1.0

Will Smith(Actor)22 Jump Street

(Movie)

Facebook(Company)

Celebrities(Category)

Football(Category)

w=1.0

w=1.0

“Defensive End”

w= -0.6

w= 1.0

Page 16: AOL's Entity Resolver

© 2016 Relegence.

Weight each Entity node by the sum of its edges

Ok, The Graph Is Built, Now What?

16

New Orleans Saints(Football team)

Atlanta Falcons(Football team)

Will Smith(Footballer)

NFL(Football League)

New Orleans (City)

Will Smith(Actor)

w <0

22 Jump Street(Movie)

Facebook(Company)

Celebrities(Category)

Football(Category)

“Defensive End”

Name(Heuristic)

Frequent(Heuristic) Single Word

(Heuristic)

w > 0

w > 0

w = 0w > 0

w > 0

w > 0

w > 0

Page 17: AOL's Entity Resolver

© 2016 Relegence.

Node Weights Groups

17

Facebook (w) < 0

NFLW > 0

22 Jump StreetW=0

Pruneimprobable at the given document

contextor Unknown Entity

No / Mixed Signalresolve them once

the correct document context is

clearer

Positive NodesLet’s start with

these

Page 18: AOL's Entity Resolver

© 2016 Relegence.

Divide the positive nodes to 2 sets, edges can still run between the sets

Solved and Unsolved Nodes Sets

18

Unsolved SetHas Ambiguity

Solved Set No Ambiguity

Will Smith

Will Smith

Our goal is to move all Entities from the unsolved set to the solved one

Page 19: AOL's Entity Resolver

© 2016 Relegence.

Group nodes by finding Strongly Connected Components (SCC) in each set

Group Nodes by Context

19

Revealing contextual regions in the document

Unsolved SetHas Ambiguity

Solved Set No Ambiguity

Will Smith

Will Smith

New Orleans Saints

Page 20: AOL's Entity Resolver

© 2016 Relegence.

Iteratively Solve on the SCCs Level

20

Iterate:

• Move the best scored SCC from the unsolved set to the solved one• Filter out all losing ambiguous nodes and their edges• Prune negative weight nodes and their edges• Extract SCCs from each set

Unsolved SetHas

Ambiguity

Solved Set No Ambiguity

Will Smith

Will Smith

Stop when the unsolved set is empty - no more ambiguity

Max Score SCC

Page 21: AOL's Entity Resolver

© 2016 Relegence.

SCC score is the sum of its nodes scores.Node Score is dominated by the sum of its edges’ weights to solved SCC

Node(score) = ∑(Edge(w) * |SCC|)

How to Score SCCs

21

Unsolved SetHas Ambiguity

Solved Set No Ambiguity

Will Smith

Will Smith

|SCC| = 2

|SCC| = 3

Page 22: AOL's Entity Resolver

© 2016 Relegence.

Revisiting Zero Weight Nodes

22

Based on the correct Entities and context regions, heuristics are applied to resolve these nodes. (Out of our scope)

Most Entities with positive correlation to the context are regarded as correct.

Page 23: AOL's Entity Resolver

© 2016 Relegence. 23

Summary

Document Graph

Solving on the SCC level and not on the Entity level

Dealing with Unknowne Entities

Shows significant improvements over state of the art products on real life scenarios

Page 24: AOL's Entity Resolver

For more information, please contact:

© 2016 Relegence.

[email protected] www.relegence.com

Thank You

Roni Wiener

Please try it at:http://www.relegence.com/demo