Summarizing Entity Descriptions for Effective and Efficient Human-centered Entity Linking

Preview:

Citation preview

Summarizing Entity Descriptions for Effective and

EfficientHuman-centered Entity

Linking

Gong Cheng, Danyun Xu, Yuzhong Qu

Websoft Research GroupState Key Laboratory for Novel Software Technology

Nanjing University, China

Entity Linking (EL)

But with the release of the iPhone 6 and the 6 Plus phablet, Apple has finally gone into big-screen territory, giving Samsung a challenge in the category that the company has been dominating for some time now.

Text Knowledge Base

iPhone 6- type: Smartphone- ...

Samsung Electronics- type: IT Company- ...

Apple (Inc.)- type: Company- type: IT company- product: iPhone 5- ...

Apple (Fruit)- type: Fruit- genus: Malus- ...

?

Candidate entities

Human-centered EL is needed

But with the release of the iPhone 6 and the 6 Plus phablet, Apple has finally gone into big-screen territory, giving Samsung a challenge in the category that the company has been dominating for some time now.

Text Knowledge Base

iPhone 6- type: Smartphone- ...

Samsung Electronics- type: IT Company- ...

Apple (Inc.)- type: Company- type: IT company- product: iPhone 5- ...

Apple (Fruit)- type: Fruit- genus: Malus- ...

?

Candidate entities

• for defining gold standard,• for crowdsourced EL.

entity description:set of property-value pairs (called features)

But with the release of the iPhone 6 and the 6 Plus phablet, Apple has finally gone into big-screen territory, giving Samsung a challenge in the category that the company has been dominating for some time now.

Text Knowledge Base

iPhone 6- type: Smartphone- ...

Samsung Electronics- type: IT Company- ...

Apple (Inc.)- type: Company- type: IT company- product: iPhone 5- ...

Apple (Fruit)- type: Fruit- genus: Malus- ...

?

Candidate entities

Entity descriptions are long.

Short, extractive summaries are adequate for human-centered EL.

Apple (Inc.)- type: Company- product: iPhone 5

Apple (Corps)- type: Company- product: Let It Be

Apple (Fruit)- type: Fruit

summary of k candidate entity descriptions: k subsets of features (subject to a length limit)

?… Apple

Short, extractive summaries are adequate for human-centered EL.

Apple (Inc.)- type: Company- product: iPhone 5

Apple (Corps)- type: Company- product: Let It Be

Apple (Fruit)- type: Fruit

?… Apple

summarizing entity descriptions combinatorial optimization

summary of k candidate entity descriptions: k subsets of features (subject to a length limit)

Optimization goal (1)+characterizing power, -information overlap• Characterizing power of a feature (ch)

ch(type: IT company) < ch(product: iPhone 5)

Apple (Inc.)

Samsung Electronics

Apple (Inc.)- type: Company- type: IT company- product: iPhone 5- ...

Optimization goal (1)+characterizing power, -information overlap• Characterizing power of a feature (ch)

ch(type: IT company) < ch(product: iPhone 5)

Apple (Inc.)

Samsung Electronics

h𝑐 ( 𝑓 )=−log number  of   entities   having   𝑓number  of   all   entities Apple (Inc.)

- type: Company- type: IT company- product: iPhone 5- ...

Optimization goal (1)+characterizing power, -information overlap• Information overlap between features (ov)

a) logical inferenceentailment = maximized ov

ov(type: IT company, type: Company) = MAX

b) string/numerical similarity

Apple (Inc.)- type: Company- type: IT company- product: iPhone 5- ...

Optimization goal (1)+characterizing power, -information overlap• Information overlap between features (ov)

a) logical inferenceentailment maximized ov

ov(type: IT company, type: Company) = MAX

b) string/numerical similarity

Apple (Inc.)- type: Company- type: IT company- product: iPhone 5- ...

Optimization goal (1)+characterizing power, -information overlap• Information overlap between features (ov)

a) logical inferenceentailment maximized ov

ov(type: IT company, type: Company) = MAX

b) string/numerical similarityov = max{similarity between properties, similarity between values}

ov(type: IT company, product: iPhone 5) = SMALL

Apple (Inc.)- type: Company- type: IT company- product: iPhone 5- ...

Optimization goal (1)+characterizing power, -information overlap• Formulated as k Quadratic Knapsack Problems

(QKP)

weight of a feature: lengthprofit of a pair of features:

to maximize characterizing powerto minimize information overlap

Optimization goal (2): +differentiating power

• Differentiating power of a pair of features (di)

a) string/numerical dissimilaritydi = property’s value uniqueness * dissimilarity between values

di(type: IT company, type: Fruit) = SMALL*LARGE = MEDIUM

(Single-valued properties are more useful.)

b) logical inferenceentailment = minimized di

di(type: IT company, type: Company) = MIN

Apple (Inc.)- type: Company- type: IT company- product: iPhone 5- ...

Apple (Fruit)- type: Fruit- genus: Malus- ...

Samsung Electronics- type: IT Company- ...

Optimization goal (2): +differentiating power

• Differentiating power of a pair of features (di)

a) string/numerical dissimilaritydi = dissimilarity between values * property’s value uniqueness

di(type: IT company, type: Fruit) = LARGE*SMALL = MEDIUM

(Single-valued properties are more useful.)

b) logical inferenceentailment = minimized di

di(type: IT company, type: Company) = MIN

Apple (Inc.)- type: Company- type: IT company- product: iPhone 5- ...

Apple (Fruit)- type: Fruit- genus: Malus- ...

Samsung Electronics- type: IT Company- ...

Optimization goal (2): +differentiating power

• Differentiating power of a pair of features (di)

a) string/numerical dissimilaritydi = dissimilarity between values * property’s value uniqueness

di(type: IT company, type: Fruit) = LARGE*SMALL = MEDIUM

(Single-valued properties are more useful.)

b) logical inferenceentailment minimized di

di(type: IT company, type: Company) = MIN

Apple (Inc.)- type: Company- type: IT company- product: iPhone 5- ...

Apple (Fruit)- type: Fruit- genus: Malus- ...

Samsung Electronics- type: IT Company- ...

Optimization goal (2): +differentiating power

• Formulated as a Quadratic Multidimensional Knapsack Problem (QMKP)

weight of a feature: lengthprofit of a pair of features: differentiating power

Optimization goal (3): +relevance to context

• Relevance of a feature to the context of entity mention• cosine similarity in the class vector model (cs)

Vector(context) = {Smarphone, IT company}Vector(type: Fruit) = {Fruit}Vector(product: iPhone 5) = {Smartphone}

cs(context, product: iPhone 5) = HIGH

• class weighting: class frequency – inverse instance frequency (CF-IIF)

But with the release of the iPhone 6 and the 6 Plus phablet, Apple has finally gone into big-screen territory, giving Samsung a challenge in the category that the company has been dominating for some time now.

Text Knowledge Base

iPhone 6- type: Smartphone- ...

Samsung Electronics- type: IT Company- ...

Apple (Inc.)- type: Company- type: IT company- product: iPhone 5- ...

Apple (Fruit)- type: Fruit- genus: Malus- ...

?

Candidate entities

Optimization goal (3): +relevance to context

• Relevance of a feature to the context of entity mention• cosine similarity in the class vector model (cs)

Vector(context) = {Smarphone, IT company}Vector(type: Fruit) = {Fruit}Vector(product: iPhone 5) = {Smartphone}

cs(context, product: iPhone 5) = HIGH

• class weighting: class frequency – inverse instance frequency (CF-IIF)

But with the release of the iPhone 6 and the 6 Plus phablet, Apple has finally gone into big-screen territory, giving Samsung a challenge in the category that the company has been dominating for some time now.

Text Knowledge Base

iPhone 6- type: Smartphone- ...

Samsung Electronics- type: IT Company- ...

Apple (Inc.)- type: Company- type: IT company- product: iPhone 5- ...

Apple (Fruit)- type: Fruit- genus: Malus- ...

?

Candidate entities

Optimization goal (3): +relevance to context

• Relevance of a feature to the context of entity mention• cosine similarity in the class vector model (cs)

Vector(context) = {Smarphone, IT company}Vector(type: Fruit) = {Fruit}Vector(product: iPhone 5) = {Smartphone}

cs(context, product: iPhone 5) = HIGH

• class weighting: class frequency – inverse instance frequency (CF-IIF)

But with the release of the iPhone 6 and the 6 Plus phablet, Apple has finally gone into big-screen territory, giving Samsung a challenge in the category that the company has been dominating for some time now.

Text Knowledge Base

iPhone 6- type: Smartphone- ...

Samsung Electronics- type: IT Company- ...

Apple (Inc.)- type: Company- type: IT company- product: iPhone 5- ...

Apple (Fruit)- type: Fruit- genus: Malus- ...

?

Candidate entities

Optimization goal (3): +relevance to context

• Relevance of a feature to the context of entity mention• cosine similarity in the class vector model (cs)

Vector(context) = {Smarphone, IT company}Vector(type: Fruit) = {Fruit}Vector(product: iPhone 5) = {Smartphone}

cs(context, product: iPhone 5) = HIGH

• class weighting: class frequency – inverse instance frequency (CF-IIF)

But with the release of the iPhone 6 and the 6 Plus phablet, Apple has finally gone into big-screen territory, giving Samsung a challenge in the category that the company has been dominating for some time now.

Text Knowledge Base

iPhone 6- type: Smartphone- ...

Samsung Electronics- type: IT Company- ...

Apple (Inc.)- type: Company- type: IT company- product: iPhone 5- ...

Apple (Fruit)- type: Fruit- genus: Malus- ...

?

Candidate entities

Optimization goal (3): +relevance to context

• Relevance of a feature to the context of entity mention• cosine similarity in the class vector model (cs)

Vector(context) = {Smarphone, IT company}Vector(type: Fruit) = {Fruit}Vector(product: iPhone 5) = {Smartphone}

cs(context, product: iPhone 5) = HIGH

• class weighting: class frequency – inverse instance frequency (CF-IIF)

But with the release of the iPhone 6 and the 6 Plus phablet, Apple has finally gone into big-screen territory, giving Samsung a challenge in the category that the company has been dominating for some time now.

Text Knowledge Base

iPhone 6- type: Smartphone- ...

Samsung Electronics- type: IT Company- ...

Apple (Inc.)- type: Company- type: IT company- product: iPhone 5- ...

Apple (Fruit)- type: Fruit- genus: Malus- ...

?

Candidate entities

Optimization goal (3): +relevance to context

• Solved by k Maximizing Marginal Relevance (MMR) frameworks• Features are iteratively selected.• In each iteration, candidate features are re-ranked by

• relevance to context• dissimilarity to selected features

Optimization goal (1+2+3)

• Formulated as a Quadratic Multidimensional Knapsack Problem (QMKP)

Experiments: data sets

• Text corpora (with entity mentions linked to Wikipedia)• AQUAINT• IITB

• Knowledge base• DBpedia

• Gold-standard links• entity mentions Wikipedia articles DBpedia entities

Experiments: EL tasks

Apple (Inc.)- type: Company- product: iPhone 5

Apple (Corps)- type: Company- product: Let It Be

Apple (Fruit)- type: Fruit

?..., Apple has finally gone into big-screen territory, …

1 target entity• gold-standard

2 (very challenging) noise entities• sharing a common name with the target entity,

obtained from Wikipedia’s disambiguation pages

Experiments: approaches

• Proposed approaches• CHR: +characterizing power, -information overlap• DFF: +differentiating power• CNT: +relevance to context• COMB: CHR+DFF+CNT

• Baseline approaches• DESC: returns entire entity descriptions• RELIN: a state-of-the-art entity summarization approach for

generic purposes

• average length of entity descriptions: 680 characters• length limit for summaries: 100 characters (14.7%)

Experiments: extrinsic evaluation• COMB is the only approach that achieved the following

statistically significant results on both data sets:• accuracy (% of correct answers): COMB = DESC• time: COMB < DESC (22-23% faster)

Experiments: intrinsic evaluation• Statistically significant results on both data sets:

• human ratings: COMB > CHR > other approaches

Future work

• More extensive experiments• to test with not-in-the-list

• Summaries for automatic EL

Questions?

Recommended