15
© 2021 Bloomberg Finance L.P. All rights reserved. Towards Realistic Few-Shot Relation Extraction The 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP 2021) November 9, 2021 Sam Brody, Sichao Wu, Adrian Benton Research Scientists, AI Group

Research Scientists, AI Group Sam Brody, Sichao Wu, Adrian

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Research Scientists, AI Group Sam Brody, Sichao Wu, Adrian

© 2021 Bloomberg Finance L.P. All rights reserved.

Towards Realistic Few-Shot Relation Extraction

The 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP 2021)November 9, 2021

Sam Brody, Sichao Wu, Adrian BentonResearch Scientists, AI Group

Page 2: Research Scientists, AI Group Sam Brody, Sichao Wu, Adrian

© 2021 Bloomberg Finance L.P. All rights reserved.

Few-Shot Learning for Relations

• Transformer-based deep neural nets (NNs) have shown high performance for few-shot learning on many NLP tasks

• FewRel Dataset (Han et. al. 2018)– Train a few-shot learner on a large number of relations with ample data– Test on unseen relations given only a few (1, 5, 10) examples

• Data:

– 100 relations x 700 instances per relation– Instances are sentences from Wikipedia– Initial dataset via distant supervision from Wikidata, then crowd-annotated– 64/16/20 split for training/validation/test (withheld)

Page 3: Research Scientists, AI Group Sam Brody, Sichao Wu, Adrian

© 2021 Bloomberg Finance L.P. All rights reserved.

Modified Objective: Relation Classification N-way-K-shot: K examples for N relations

1. He was trained under Abdul Rahim Khan-I-Khana, the son of Bairam Khan and he eventually accepted Islam.

?

Query Example

● In each iteration: randomly sample N relations and K examples for each

1. 1) Johannes Joseph van der Velden (7 August 1891 – 19 May 1954) was a Catholic theologian and Bishop of Aachen.

2. K) Milkha Singh, a Sikh boy born around 1930s, runs against trains for fun.relig

ion

1. 1) Digital hardcore band, Lolita Storm, covered " Stranger than Kindness "for a tribute album, "Eyes for an Eye: A Tribute to Nick Cave", in 1996.

2. K) Dragon of the Lost Sea is a fantasy novel by Chinese - American author Laurence Yep.

genr

e

1. 1) Cunlhat was the birthplace of Maurice Pialat (1925–2003), film director and actor.

2.K) Artur Soares Dias of Porto was named as referee for the match on the 8 August.oc

cupa

tion

1. 1) Aarya Babbar is the son of actor turned politician Raj Babbar and theatre personality Nadira Babbar.2.3. K) Temenus had a son named Archelaus.fa

ther

1. 1) Salaryevo is a Moscow Metro station on the Sokolnicheskaya line.2.

K) Korean, along with Chinese and Japanese, is a member of the CJK group and shares origins for many of the symbols.ha

s pa

rt

...

...

...

...

...

1

N

2

3

4

● Train for 30,000 iterations

● We focus on 5-Way-5-Shot(N=5, K=5)

Page 4: Research Scientists, AI Group Sam Brody, Sichao Wu, Adrian

© 2021 Bloomberg Finance L.P. All rights reserved.

SOTA - Prototype Approach

• Transformer-based encoder• Each example represented by encoding of specific

tokens:

• Prototype is pooled representation of all examples• Query is labeled by similarity between its representation

and candidate relation prototypes

Encoder Accuracy

CNN 85.24%

BERT 93.78%

SpanBERT 95.19%

RoBERTa-base 95.18%

RoBERTa-large 96.23%

Luke-base 95.08%

Matching the Blanks 1 97.06% 2

1. Beats human performance on 5-way-1-shot2. On witheld test set (not validation)

Page 5: Research Scientists, AI Group Sam Brody, Sichao Wu, Adrian

© 2021 Bloomberg Finance L.P. All rights reserved.

“How well do models trained for few-shot relation classification do at few-shot relation extraction?”

Page 6: Research Scientists, AI Group Sam Brody, Sichao Wu, Adrian

© 2021 Bloomberg Finance L.P. All rights reserved.

Problem: FewRel setup differs from classic RE use case:

• Evaluation method obscures performance on individual relations - depends heavily on random selection

distinguishing between N randomly sampled relations vs. detecting one relation of interest

among many negatives

Solution: An alternate evaluation• New dataset: 50 examples from each relation• Evaluate per relation - binary classification (50 POS, the rest NEG)• Sample 5 examples for prototype, sort by similarity, use precision@50• Repeat 10 times (to reduce effect of example choices)

Page 7: Research Scientists, AI Group Sam Brody, Sichao Wu, Adrian

© 2021 Bloomberg Finance L.P. All rights reserved.

Breakdown by Relation - FewRel (Roberta-base)

• part-of overlaps with othersstar constellation

TV/film language

military rank

previous item

singer voice type

competition class

position on team

water crossed

mother

subject of work

child

person/team sport

member of org

nearby water

spouse

part of

Page 8: Research Scientists, AI Group Sam Brody, Sichao Wu, Adrian

© 2021 Bloomberg Finance L.P. All rights reserved.

Breakdown by Relation - FewRel (Roberta-base)

• part-of overlaps with othersstar constellation

TV/film language

military rank

previous item

singer voice type

competition class

position on team

water crossed

mother

subject of work

child

person/team sport

member of org

nearby water

spouse

part of

• The rest:– ⅓ are > 97%

– ⅓ are > 80%

– ⅓ are 50-75% !!

Page 9: Research Scientists, AI Group Sam Brody, Sichao Wu, Adrian

© 2021 Bloomberg Finance L.P. All rights reserved.

Breakdown by Relation - FewRel (Roberta-base)

• part-of overlaps with othersstar constellation

TV/film language

military rank

previous item

singer voice type

competition class

position on team

water crossed

mother

subject of work

child

person/team sport

member of org

nearby water

spouse

part of

• The rest:– ⅓ are > 97%

– ⅓ are > 80%

– ⅓ are 50-75% !!

• Relations sharing same entity types are confused:– water crossed, nearby water

– person/team sport, position on team, competition class

– mother, child, spouse

Page 10: Research Scientists, AI Group Sam Brody, Sichao Wu, Adrian

© 2021 Bloomberg Finance L.P. All rights reserved.

• Confusion with the same type signature:– date types– location types

– date types– location types– date types

Breakdown by Relation - TACRED Org. (Roberta-base) website

pol/rel affiliation

members

HQ country

date founded

HQ state/prov

num. members

founder

top members

HQ city

alt. names

parent orgs

date dissolved

subsidiaries

shareholders

member of

• Best at discriminating between distinct types:– website, affiliation

Page 11: Research Scientists, AI Group Sam Brody, Sichao Wu, Adrian

© 2021 Bloomberg Finance L.P. All rights reserved.

• Confusion with the same type signature:– date types– location types

– date types– location types– date types

Breakdown by Relation - TACRED Org. (Roberta-base) website

pol/rel affiliation

members

HQ country

date founded

HQ state/prov

num. members

founder

top members

HQ city

alt. names

parent orgs

date dissolved

subsidiaries

shareholders

member of

• Best at discriminating between distinct types:– website, affiliation

Similar behavior has been observed in SOTA supervised models:Rosenman et al. (2020), Alt et al. (2020), Tran et al. (2020)

Page 12: Research Scientists, AI Group Sam Brody, Sichao Wu, Adrian

© 2021 Bloomberg Finance L.P. All rights reserved.

Data Augmentation

Reasoning:• More challenging (confusable) relations during training will force the model to

look beyond type signature

Experiment:• Add TACRED person relations to FewRel training data• Evaluate on TACRED organization relations (no overlap with training relations)

Page 13: Research Scientists, AI Group Sam Brody, Sichao Wu, Adrian

© 2021 Bloomberg Finance L.P. All rights reserved.

* FewRel validation set has some overlap with augmented data

Data Augmentation Results

Page 14: Research Scientists, AI Group Sam Brody, Sichao Wu, Adrian

© 2021 Bloomberg Finance L.P. All rights reserved.

Conclusions• Few-Shot relation classification scores do not reflect performance in a “realistic”

relation extraction scenario

• Few-Shot prototype models focus on entity type signatures, similar to SOTA supervised models

• Adding confusable relations during training helps alleviate the issue

• Few-shot learning can achieve high performance on unseen relations!

Future Work• Smarter dynamic sampling

• Alternate prototype representations

Page 15: Research Scientists, AI Group Sam Brody, Sichao Wu, Adrian

© 2021 Bloomberg Finance L.P. All rights reserved.

Thank you!We are hiring: bloomberg.com/engineeringAI Group - NLP Scientist https://careers.bloomberg.com/job/detail/87013