30
DECEMBER 11, 2019 GEORGE WASHINGTON UNIVERSITY SPONSORED BY DECEMBER 11, 2019 GEORGE WASHINGTON UNIVERSITY SPONSORED BY Why Is Named Entity Recognition So Hard? Turning to Philosophy to Improve Annotation for Named Entities Zachary Yocum 1

Why Is Named Entity Recognition So Hard?...Rule-based approaches to NER … Treat NER as a pattern-matching problem Create patterns (such as regular expressions) that can match entity

  • Upload
    others

  • View
    24

  • Download
    0

Embed Size (px)

Citation preview

DECEMBER 11, 2019 ● GEORGE WASHINGTON UNIVERSITY ● SPONSORED BYDECEMBER 11, 2019 ● GEORGE WASHINGTON UNIVERSITY ● SPONSORED BY

Why Is Named Entity Recognition So Hard?

Turning to Philosophy to Improve Annotation for Named Entities

Zachary Yocum

1

DECEMBER 11, 2019 ● GEORGE WASHINGTON UNIVERSITY ● SPONSORED BY

Why Is Named Entity Recognition (NER) So Hard?

1. Brief history of NER as an Information Extraction (IE) task2. Motivate why NER is useful3. Lay out some major difficulties4. Look at what the field of philosophy has to offer

2

DECEMBER 11, 2019 ● GEORGE WASHINGTON UNIVERSITY ● SPONSORED BY

Brief History of Named Entity Recognition

● NER is an Information Extraction (IE) task concerned with locating and classifying named entity mentions in unstructured text.

● Spurred by various programs including …

3

Conference Timeframe Major Funders/Organizers

Message Understanding Conference (MUC) 1987 - 1998 Defense Advanced Research Project Agency (DARPA)

SemEval 1998 - 2019 ... Association for Computational Linguistics (ACL)

Automatic Content Extraction (ACE) 1999 - 2008 National Institute of Standards and Technology (NIST), Linguistic Data Consortium (LDC)

Text Analysis Conference (TAC) 2008 - 2019 ... NIST, LDC

DECEMBER 11, 2019 ● GEORGE WASHINGTON UNIVERSITY ● SPONSORED BY

Example

4

“The Secret Trade:Firms That Promised High-Tech Ransomware Solutions Almost Always Just Pay the Hackers”by Renee Dudley and Jeff KaoMay 15, 2019 — ProPublica

DECEMBER 11, 2019 ● GEORGE WASHINGTON UNIVERSITY ● SPONSORED BY

Generally, NER extracts the “who?”, “what?”, “where?”, “when?” in a document.NER has been applied to domains and genres such as …● News● Biomedical journal articles● Court records● Military dispatches & reports● Telephone conversations● Social media

What Value Does NER Provide?

5

DECEMBER 11, 2019 ● GEORGE WASHINGTON UNIVERSITY ● SPONSORED BY

NER is also useful as an intermediate information extraction step for downstream processing tasks such as …● Sentiment analysis● Relationship extraction● Coreference● Summarization● Entity Linking

What Value Does NER Provide?

6

DECEMBER 11, 2019 ● GEORGE WASHINGTON UNIVERSITY ● SPONSORED BY

Different NER Strategies

Naïve approach to NER …● Treat NER as a search problem● Compile a list of names for each entity type and search for mentions of

the namesRule-based approaches to NER …● Treat NER as a pattern-matching problem● Create patterns (such as regular expressions) that can match entity

mentionsStatistical approaches to NER …● Treat NER as a sequential tagging problem● Train statistical models based on manually annotated example data

7

DECEMBER 11, 2019 ● GEORGE WASHINGTON UNIVERSITY ● SPONSORED BY

The Difficulties of NER

1. Ambiguity (polysemy, ‘many meanings’)○ Syntactic ambiguity○ Lexical ambiguity

2. World knowledge (which is always incomplete)○ Human knowledge is limited and varies by domain○ The world changes○ Language changes

8

DECEMBER 11, 2019 ● GEORGE WASHINGTON UNIVERSITY ● SPONSORED BY

Syntactic Ambiguity

9

“The defendant hit the lawyer with the briefcase.”

The defendant used the briefcase to hit the

lawyer.

The defendant hit the lawyer who was holding

the briefcase.

DECEMBER 11, 2019 ● GEORGE WASHINGTON UNIVERSITY ● SPONSORED BY

Lexical Ambiguity

10

The lawyer dropped his documents pertaining to

legal arguments.

The lawyer dropped his underwear.

“The lawyer dropped his briefs.”

DECEMBER 11, 2019 ● GEORGE WASHINGTON UNIVERSITY ● SPONSORED BY

Lexical Ambiguity of Names

Aristotle (disambiguation)

From Wikipedia, the free encyclopedia

Aristotle (Ἀριστοτέλης, Aristotélēs) is a Greek given name. It mostly refers to Aristotle of Stagira (384 BC–322 BC), the Greek philosopher.

● Aristotele Fioravanti (c. 1415 – c. 1486), Italian Renaissance architect and engineer

● Aristotelis Valaoritis (1824–1879), Greek poet● Aristotle Onassis (1906–1975), Greek shipping magnate● ...

11

DECEMBER 11, 2019 ● GEORGE WASHINGTON UNIVERSITY ● SPONSORED BY

Initialisms by Domain

12

Domain AMA

Business American Music Association

Technology Automatic Message Accounting of phone bills

Biology & MedicineAmerican Medical Associationagainst medical advice

GeographyAma, Aichi, a city in JapanAma, Iran, a village in Ilam ProvinceAma, Louisiana, a town in the US

Other

ask me anythingSamoan military titleAma language (New Guinea)Ama language (Sudan)Shola Ama, British singerRick Husband Amarillo (International Air Transport Association code)

DECEMBER 11, 2019 ● GEORGE WASHINGTON UNIVERSITY ● SPONSORED BYDECEMBER 11, 2019 ● GEORGE WASHINGTON UNIVERSITY ● SPONSORED BY

ep·o·nym | ˈɛpəˌnɪmnoun

a person after whom a discovery, invention, place, etc., is named or thought to be named.• a name or noun formed after a person.

Oxford University Press. (2010). New Oxford American Dictionary. Oxford University Press.

13

DECEMBER 11, 2019 ● GEORGE WASHINGTON UNIVERSITY ● SPONSORED BY

Eponymy

14

DECEMBER 11, 2019 ● GEORGE WASHINGTON UNIVERSITY ● SPONSORED BY

Examples

15

DECEMBER 11, 2019 ● GEORGE WASHINGTON UNIVERSITY ● SPONSORED BY

The Problem of Reference

Heosphorus is PhosphorusPierre Delecto is Mitt Romney

16

DECEMBER 11, 2019 ● GEORGE WASHINGTON UNIVERSITY ● SPONSORED BY

The World Changes

Charles II was the King of France (843-877)Louis V was the King of France (986-987)Henry VI was the King of France (1422-1453, maybe?)Henry VI was the King of England (1422-1461)Henry VIII was the King of England (1509-1547)

17

DECEMBER 11, 2019 ● GEORGE WASHINGTON UNIVERSITY ● SPONSORED BY

The World Changes

New names enter language all the time.

● Archie Harrison Mountbatten-Windsor: Son of the Duke and Duchess of Essex and great-grandson of Queen Elizabeth II, born in 2019

● Anak Krakatoa: Volcanic island in Indonesia formed in 1927

● Flytxt: Telecommunications and analytics software company founded in 2008

18

DECEMBER 11, 2019 ● GEORGE WASHINGTON UNIVERSITY ● SPONSORED BY

Language Changes

New meanings for existing names enter language all the time.

● South Sudan: Country which gained independence from the Republic of Sudan in 2011

● google (as a verb): To search the web (or other indexed electronic resources)

● Main Street: A place representing the interests of small businesses in North America (as contrasted with Wall Street or Bay Street, representing large corporations)

19

DECEMBER 11, 2019 ● GEORGE WASHINGTON UNIVERSITY ● SPONSORED BY

Capitalization

20

German

Swedish

DECEMBER 11, 2019 ● GEORGE WASHINGTON UNIVERSITY ● SPONSORED BY

Ferdinand De Saussure

● Swiss philosopher, linguist, and semiotician of 19th and 20th century

● Foundational work on semiotics, the study of signs

21

DECEMBER 11, 2019 ● GEORGE WASHINGTON UNIVERSITY ● SPONSORED BY

Saussurean Model for NER

22

The symbol(s) used representationally in

natural language: phonemes, graphemes

The properties that a signifier expresses

The thing(s) that a signifier refers to

which possess the signified properties

ReferentSignifiedSignifier

DECEMBER 11, 2019 ● GEORGE WASHINGTON UNIVERSITY ● SPONSORED BY

Saussurean Model for NER

23

Referent

Aristotle

SignifiedSignifier

Born: 384 BCStagira, Chalcidian LeagueDied: 322 BC (aged ~62)Euboea, Macedonian EmpireNative Name: ἈριστοτέληType: Person...

Aristotle of Stagira

DECEMBER 11, 2019 ● GEORGE WASHINGTON UNIVERSITY ● SPONSORED BY

Beyond NER

24

A related task to NER is the task of Entity Disambiguation or Entity Linking.

Born: 384 BCStagira, Chalcidian League

Name (en): AristotleName (el): ἈριστοτέληOccupation: PhilosopherType: Person...

Id: Q421280

Born: 1906İzmir, Turkey

Name (en): Aristotle Socrates Onassis

Name (el): Αριστοτέλης Ωνάσης

Occupation: Shipping magnate

Type: Person...

Id: Q180455

Knowledge Base (KB)Knowledge Base

Entries

DECEMBER 11, 2019 ● GEORGE WASHINGTON UNIVERSITY ● SPONSORED BYDECEMBER 11, 2019 ● GEORGE WASHINGTON UNIVERSITY ● SPONSORED BY

me·ton·y·my | məˈtɑnəminoun (plural metonymies)

the substitution of the name of an attribute or adjunct for that of the thing meant, for example suit for business executive, or the track for horse racing.

Oxford University Press. (2010). New Oxford American Dictionary. Oxford University Press.

25

DECEMBER 11, 2019 ● GEORGE WASHINGTON UNIVERSITY ● SPONSORED BYDECEMBER 11, 2019 ● GEORGE WASHINGTON UNIVERSITY ● SPONSORED BY

syn·ec·do·che | səˈnɛkdəkinoun

a figure of speech in which a part is made to represent the whole or vice versa, as in Cleveland won by six runs (meaning “Cleveland's baseball team”).

Oxford University Press. (2010). New Oxford American Dictionary. Oxford University Press.

26

DECEMBER 11, 2019 ● GEORGE WASHINGTON UNIVERSITY ● SPONSORED BY

Metonymy

27

How does Wall Street view news of trade talks?

DECEMBER 11, 2019 ● GEORGE WASHINGTON UNIVERSITY ● SPONSORED BY

Frequency of Metonymy (Swedish Example)

28

#Metonyms

#Mentions

%Metonyms

#Documents w/

Metonyms#

Documents

%Documents w/

Metonyms

News 2,859 49,119 5.82% 1,152 3,824 30.13%

Tweets 245 10,396 2.36% 188 4,000 4.70%

Total 3,104 59,515 5.22% 1,340 7,824 17.13%

DECEMBER 11, 2019 ● GEORGE WASHINGTON UNIVERSITY ● SPONSORED BY

Saussurean Model for NER

29

The symbol(s) used representationally in

natural language: phonemes, graphemes

The properties that a signifier expresses

The thing(s) that a signifier refers to

which possess the signified properties

ReferentSignifiedSignifier

DECEMBER 11, 2019 ● GEORGE WASHINGTON UNIVERSITY ● SPONSORED BY

Conclusion

● Natural language presents many difficulties for extracting mentions of named entities.

● Applying a Saussurean model, we’ve refined the NER annotation task by considering linguistic signs, their significances, and the entities they reference.

30