33
– IN5550 – Neural Methods in Natural Language Processing Home Exam: Task Overview and Kick-Off Stephan Oepen, Lilja Øvrelid, & Erik Velldal University of Oslo April 21, 2020

IN5550 Neural Methods in Natural Language Processing [1ex ... · –IN5550– NeuralMethodsinNaturalLanguageProcessing HomeExam: TaskOverviewandKick-Off StephanOepen,LiljaØvrelid,&ErikVelldal

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: IN5550 Neural Methods in Natural Language Processing [1ex ... · –IN5550– NeuralMethodsinNaturalLanguageProcessing HomeExam: TaskOverviewandKick-Off StephanOepen,LiljaØvrelid,&ErikVelldal

– IN5550 –Neural Methods in Natural Language Processing

Home Exam: Task Overview and Kick-Off

Stephan Oepen, Lilja Øvrelid, & Erik Velldal

University of Oslo

April 21, 2020

Page 2: IN5550 Neural Methods in Natural Language Processing [1ex ... · –IN5550– NeuralMethodsinNaturalLanguageProcessing HomeExam: TaskOverviewandKick-Off StephanOepen,LiljaØvrelid,&ErikVelldal

Home Exam

General Idea

I Use as guiding metaphor: Preparing a scientific paper for publication.Second IN5550 Teaching Workshop on Neural NLP (WNNLP 2020)

Standard Process

(0) Problem Statement

(1) Experimentation

(2) Analysis

(3) Paper Submission

(4) Reviewing

(5) Camera-Ready Manuscript

(6) Presentation2

Page 3: IN5550 Neural Methods in Natural Language Processing [1ex ... · –IN5550– NeuralMethodsinNaturalLanguageProcessing HomeExam: TaskOverviewandKick-Off StephanOepen,LiljaØvrelid,&ErikVelldal

For Example: The ACL 2020 Conference

3

Page 4: IN5550 Neural Methods in Natural Language Processing [1ex ... · –IN5550– NeuralMethodsinNaturalLanguageProcessing HomeExam: TaskOverviewandKick-Off StephanOepen,LiljaØvrelid,&ErikVelldal

WNNLP 2020: Call for Papers and Important Dates

General ConstraintsI Three specialized tracks: NER, Negation Scope, Sentiment Analysis.I Long papers: up to nine pages, excluding references, in ACL 2020 style.I Submitted papers must be anonymous: peer reviewing is double-blind.I Replicability: Submission backed by code repository (area chairs only).

ScheduleBy April 22 Declare choice of track (and team composition)April 28 Per-track mentoring sessions with Area ChairsEarly May Individual supervisory meetings (upon request)May 12 (Strict) Submission deadline for scientific papers

May 13–18 Reviewing period: Each student reviews two papersMay 20 Area Chairs make and announce acceptance decisionsMay 25 Camera-ready manuscripts due, with requested revisionsMay 27 Oral presentations and awards at the workshop

4

Page 5: IN5550 Neural Methods in Natural Language Processing [1ex ... · –IN5550– NeuralMethodsinNaturalLanguageProcessing HomeExam: TaskOverviewandKick-Off StephanOepen,LiljaØvrelid,&ErikVelldal

The Central Authority for All Things WNNLP 2020

https://www.uio.no/studier/emner/matnat/ifi/IN5550/v20/exam.html

5

Page 6: IN5550 Neural Methods in Natural Language Processing [1ex ... · –IN5550– NeuralMethodsinNaturalLanguageProcessing HomeExam: TaskOverviewandKick-Off StephanOepen,LiljaØvrelid,&ErikVelldal

WNNLP 2020: What Makes a Good Scientific Paper?

Empirical (Experimental)I Motivate architecture choice(s) and hyper-parameters;I systematic exploration of relevant parameter space;I comparison to reasonable baseline or previous work.

Replicable (Reproducible)I Everything relevant to run and reproduce in M$ GitHub.

Analytical (Reflective)I Identify and relate to previous work;I explain choice of baseline or points of comparison;I meaningful, precise discussion of results;I ‘negative’ results can be interesting too;I look at the data: discuss some examples:I error analysis: identify remaining challenges.

6

Page 7: IN5550 Neural Methods in Natural Language Processing [1ex ... · –IN5550– NeuralMethodsinNaturalLanguageProcessing HomeExam: TaskOverviewandKick-Off StephanOepen,LiljaØvrelid,&ErikVelldal

WNNLP 2020: Programme Committee

General Chair

I Stephan Oepen

Area Chairs

I Named Entity Recognition: Erik Velldal

I Negation Scope: Stephan Oepen

I Sentiment Analysis: Lilja Øvrelid & Jeremy Barnes

Peer Reviewers

I All students who have submitted a scientific paper

7

Page 8: IN5550 Neural Methods in Natural Language Processing [1ex ... · –IN5550– NeuralMethodsinNaturalLanguageProcessing HomeExam: TaskOverviewandKick-Off StephanOepen,LiljaØvrelid,&ErikVelldal

Track 1: Named Entity Recognition

I NER: The task of identifying and categorizing proper names in text.

I Typical categories: persons, organizations, locations, geo-politicalentities, products, events, etc.

I Example from NorNE which is the corpus we will be using:

ORG GPE_LOCDen internasjonale domstolen har sete i Haag .The International Court of Justice has its seat in The Hague .

8

Page 9: IN5550 Neural Methods in Natural Language Processing [1ex ... · –IN5550– NeuralMethodsinNaturalLanguageProcessing HomeExam: TaskOverviewandKick-Off StephanOepen,LiljaØvrelid,&ErikVelldal

Class labels

I Abstractly a sequence segmentation task,

I but in practice solved as a sequence labeling problem,

I assigning per-word labels according to some variant of the BIO scheme

B-ORG I-ORG I-ORG O O O B-GPE_LOC ODen internasjonale domstolen har sete i Haag .

9

Page 10: IN5550 Neural Methods in Natural Language Processing [1ex ... · –IN5550– NeuralMethodsinNaturalLanguageProcessing HomeExam: TaskOverviewandKick-Off StephanOepen,LiljaØvrelid,&ErikVelldal

NorNE

I First publicly available NER dataset for Norwegian; joint effort betweenLTG, Schibsted and Språkbanken (the National Library).

I Named entity annotations added to NDT for both Bokmål and Nynorsk:I ∼300K tokens for each, of which ∼20K form part of a NE.I Distributed in the CoNLL-U format using the BIO labeling scheme.

Simplified version:

1 Den den DET name=B-ORG2 internasjonale internasjonal ADJ name=I-ORG3 domstolen domstol NOUN name=I-ORG4 har ha VERB name=O5 sete sete NOUN name=O6 i i ADP name=O7 Haag Haag PROPN name=B-GPE_LOC8 . $. PUNCT name=O

10

Page 11: IN5550 Neural Methods in Natural Language Processing [1ex ... · –IN5550– NeuralMethodsinNaturalLanguageProcessing HomeExam: TaskOverviewandKick-Off StephanOepen,LiljaØvrelid,&ErikVelldal

NorNE entity types (Bokmål)

Type Train Dev Test Total

PER 4033 607 560 5200ORG 2828 400 283 3511GPE_LOC 2132 258 257 2647PROD 671 162 71 904LOC 613 109 103 825GPE_ORG 388 55 50 493DRV 519 77 48 644EVT 131 9 5 145MISC 8 0 0 0

https://github.com/ltgoslo/norne/

11

Page 12: IN5550 Neural Methods in Natural Language Processing [1ex ... · –IN5550– NeuralMethodsinNaturalLanguageProcessing HomeExam: TaskOverviewandKick-Off StephanOepen,LiljaØvrelid,&ErikVelldal

Evaluating NER

I While NER can be evaluated by P, R and F1 at the token-level,I evaluating on the entity-level can be more informative.I Several ways to do this (wording from SemEval 2013 task 9.1 in parens):

I Exact labeled (‘strict’): The gold annotation and the system output isidentical; both the predicted boundary and entity label is correct.

I Partial labeled (‘type’): Correct label and at least a partial boundarymatch.

I Exact unlabeled (‘exact’): Correct boundary, disregarding the label.I Partial unlabeled (‘partial’): At least a partial boundary match,

disregarding the label.

I https://github.com/davidsbatista/NER-Evaluation

12

Page 13: IN5550 Neural Methods in Natural Language Processing [1ex ... · –IN5550– NeuralMethodsinNaturalLanguageProcessing HomeExam: TaskOverviewandKick-Off StephanOepen,LiljaØvrelid,&ErikVelldal

NER model

I Current go-to model for NER: a BiLSTM with a CRF inference layer,I possibly with a max-pooled character-level CNN feeding into the

BiLSTM together with pre-trained word embeddings.

(Image: Jie Yang & Yue Zhang 2018: NCRF++: An Open-source NeuralSequence Labeling Toolkit)

13

Page 14: IN5550 Neural Methods in Natural Language Processing [1ex ... · –IN5550– NeuralMethodsinNaturalLanguageProcessing HomeExam: TaskOverviewandKick-Off StephanOepen,LiljaØvrelid,&ErikVelldal

Suggested reading on neural seq. modeling

I Jie Yang, Shuailong Liang, & Yue Zhang, 2018Design Challenges and Misconceptions in Neural Sequence Labeling(Best Paper Award at COLING 2018)https://aclweb.org/anthology/C18-1327

I Nils Reimers & Iryna Gurevych, 2017Optimal Hyperparameters for Deep LSTM-Networks for SequenceLabeling Taskshttps://arxiv.org/pdf/1707.06799.pdf

State-of-the-art leaderboards for NERI https://nlpprogress.com/english/named_entity_recognition.htmlI https://paperswithcode.com/task/named-entity-recognition-ner

14

Page 15: IN5550 Neural Methods in Natural Language Processing [1ex ... · –IN5550– NeuralMethodsinNaturalLanguageProcessing HomeExam: TaskOverviewandKick-Off StephanOepen,LiljaØvrelid,&ErikVelldal

More information about the dataset

I https://github.com/ltgoslo/norne

I F. Jørgensen, T. Aasmoe, A.S. Ruud Husevåg, L. Øvrelid and E. VelldalNorNE: Annotating Named Entities for NorwegianProceedings of the 12th Edition of its Language Resources andEvaluation Conference, Marseille, France, 2020https://arxiv.org/pdf/1911.12146.pdf

15

Page 16: IN5550 Neural Methods in Natural Language Processing [1ex ... · –IN5550– NeuralMethodsinNaturalLanguageProcessing HomeExam: TaskOverviewandKick-Off StephanOepen,LiljaØvrelid,&ErikVelldal

Some suggestions to get started with experimentationI Different label encodings BIO-1 / BIO-2 / BIOES etc.I Different label set granularities:

I 8 entity types in NorNE by default (MISC can be ignored)I Could be reduced to 7 by collapsing GPE_LOC and GPE_ORG to GPE, or to

6 by mapping them to LOC and ORG.

I Impact of different parts of the architecture:I CRF vs softmaxI Impact of including a character-level model (e.g. CNN or RNN).

Tip: evaluate effect for OOVs.I Adding several BiLSTM layers

I Do different evaluation strategies give different relative rankings ofdifferent systems?

I Compute learning curves

I Mixing Bokmål / Nynorsk? Machine-translation?

I Impact of embedding pre-training (corpus, dim., framework, etc)

I Possibilities for transfer / multi-task learning?16

Page 17: IN5550 Neural Methods in Natural Language Processing [1ex ... · –IN5550– NeuralMethodsinNaturalLanguageProcessing HomeExam: TaskOverviewandKick-Off StephanOepen,LiljaØvrelid,&ErikVelldal

Track 2: Negation Scope

Non-Factuality (and Uncertainty) Very Common in LanguageBut {this theory would} 〈not〉 {work}.

I think, Watson, {a brandy and soda would do him} 〈no〉 {harm}.

They were all confederates in {the same} 〈un〉{known crime}.

“Found dead 〈without〉 {a mark upon him}.

{We have} 〈never〉 {gone out 〈without〉 {keeping a sharp watch}},and 〈no〉 {one could have escaped our notice}.”

Phorbol activation was positively modulated by Ca2+ influxwhile {TNF alpha activation was} 〈not〉.

CoNLL 2010, *SEM 2012, and EPE 2017 International Shared TasksI Bake-off: Standardized training and test data, evaluation, schedule;I 20+ participants; LTG systems top performers throughout the years.

17

Page 18: IN5550 Neural Methods in Natural Language Processing [1ex ... · –IN5550– NeuralMethodsinNaturalLanguageProcessing HomeExam: TaskOverviewandKick-Off StephanOepen,LiljaØvrelid,&ErikVelldal

Small Words Can Make a Large Difference

18

Page 19: IN5550 Neural Methods in Natural Language Processing [1ex ... · –IN5550– NeuralMethodsinNaturalLanguageProcessing HomeExam: TaskOverviewandKick-Off StephanOepen,LiljaØvrelid,&ErikVelldal

The *SEM 2012 Data (Morante & Daelemans, 2012)http://www.lrec-conf.org/proceedings/lrec2012/pdf/221_Paper.pdf

ConanDoyle-neg: Annotation of negation in Conan Doyle stories

Roser Morante and Walter Daelemans

CLiPS - University of AntwerpPrinsstraat 13, B-2000 Antwerp, Belgium{Roser.Morante,Walter.Daelemans}@ua.ac.be

AbstractIn this paper we present ConanDoyle-neg, a corpus of stories by Conan Doyle annotated with negation information. The negation cuesand their scope, as well as the event or property that is negated have been annotated by two annotators. The inter-annotator agreement ismeasured in terms of F-scores at scope level. It is higher for cues (94.88 and 92.77), less high for scopes (85.04 and 77.31), and lowerfor the negated event (79.23 and 80.67). The corpus is publicly available.

Keywords: Negation, scopes, corpus annotation

1. IntroductionIn this paper we present ConanDoyle-neg, a corpus ofConan Doyle stories annotated with negation cues andtheir scope. The annotated texts are The Hound of theBaskervilles (HB) and The Adventure of Wisteria Lodge(WL). The original texts are freely available from theGutenberg Project at http://www.gutenberg.org/browse/authors/d\#a37238 . The main reason tochoose this corpus is that part of it has been annotatedwith coreference, semantic roles and null instantiations ofsemantic roles for the SemEval Task Linking Events andTheir Participants in Discourse (Ruppenhofer et al., 2010).In this way, negation complements the annotations under-taken for the shared task. Another reason is that there isa lack of corpora annotated with negation information fortexts from domains other than the biomedical domain.Negation is a grammatical category that comprises devicesused to reverse the truth value of propositions. In natu-ral language, negation functions as an operator along withquantifiers and modals. The main characteristic of opera-tors is that they have a scope: elements to which negative,modals and quantifiers refer are in the scope of the negativeoperator.The study of negation from a philosophical point of viewstarted with Aristotle and nowadays is still a topic that gen-erates a considerable number of publications in philoso-phy, logic, psycholinguistics, and linguistics. Horn (1989)provides an extensive description of negation from a his-toric perspective and an analysis of negation in relation tosemantic and pragmatic phenomena. Tottie (1991) stud-ies negation in English from a descriptive and quantitativepoint of view, based on the analysis of empirical material.She defines two main types of negation in natural language,rejections of suggestions and denials of assertions, whichcan be explicit and implicit.Languages have devices to negate entire propositions(clausal negation) or constituents of clauses (constituentnegation). Most languages have several grammatical de-vices to express clausal negation, which are used with dif-ferent purposes such as negating existence versus negat-ing facts, or negating different aspects, modes or speechacts (Payne, 1997). Negation is not only a grammatical phe-

nomenon present in all languages. As (Lawler, 2010) putsit, “negation is a linguistic, cognitive, and intellectual phe-nomenon. Ubiquitous and richly diverse in its manifesta-tions, it is fundamentally important to all human thought”.Negation is a frequent phenomenon in language. Tottie re-ports that negation is twice as frequent in spoken text (27,6per 1000 words) as in written text (12,8 per 1000 words).Councill et al. (2010) annotate a corpus of product re-views with negation information and they find that 19%of the sentences contain negations (216 out of 1135). Inthe ConanDoyle-neg corpus 22.49% of sentences containat least one negation.The interest in automatically processing negation first orig-inated in the medical domain as a response to the need ofautomatically processing and indexing clinical reports anddischarge summaries. For this task it is very relevant tofind negated symptoms, signs, treatments, and outcomes.Interest in the biomedical text mining community to extractaccurate information about biological relations has boostedthe research on negation processing. The release of the Bio-Scope corpus (Vincze et al., 2008) has allowed to developnegation scope resolvers for biomedical texts. The corpusgathers clinical free-texts, biological full papers, and bio-logical paper abstracts annotated with negation cues, i.e.,words that express negation, and their scope. Blanco andMoldovan (2011) take a different approach by annotatingthe focus, “that part of the scope that is most prominentlyor explicitly negated”, in the 3,993 verbal negations sig-naled with MNEG in the PropBank corpus. According to theauthors, the annotation of the focus allows to derive the im-plicit positive meaning of negated statements. For example,in (1) the focus of the negation is on until 2008, and the im-plicit positive meaning is ‘They released the UFO files in2008’.

(1) They didn’t release the UFO files until 2008.

However, there is a lack of resources annotated with nega-tion information for general domain texts. We consider thisto be of great importance because negation adds informa-tion about an extra-propositional aspect of meaning. Pro-cessing negation is essential in order to know whether anevent is presented as factual or counterfactual. The ex-

1563

19

Page 20: IN5550 Neural Methods in Natural Language Processing [1ex ... · –IN5550– NeuralMethodsinNaturalLanguageProcessing HomeExam: TaskOverviewandKick-Off StephanOepen,LiljaØvrelid,&ErikVelldal

Negation Analysis as a Tagging Task

we have never gone out without keeping a sharp watch , and no one could have escaped our notice . "

nsubj

auxneg

conj

ccpunct

prep

part

pcomp dobjdetamod

dep

nsubjaux

aux

punctpunct

dobjposs

root

ann. 1:ann. 2:ann. 3:

cuecue

cuelabels: CUE CUE CUEN N E E N N N N E N N N NS O S ON

{ } { }

{ }{ }

⟩⟨⟩⟨

⟩⟨

I Sherlock (Lapponi et al., 2012, 2017) almost state of the art today;

I ‘flattens out’ multiple, potentially overlapping negation instances;

I post-classification: heuristic reconstruction of separate structures.

I To what degree is cue classification a sequence labeling problem?

20

Page 21: IN5550 Neural Methods in Natural Language Processing [1ex ... · –IN5550– NeuralMethodsinNaturalLanguageProcessing HomeExam: TaskOverviewandKick-Off StephanOepen,LiljaØvrelid,&ErikVelldal

Up-to-Date System Description: Lapponi et al. (2017)http://epe.nlpl.eu/2017/49.pdf

EPE 2017:The Sherlock Negation Resolution Downstream Application

Emanuele Lapponi♣, Stephan Oepen ♣♠, and Lilja Øvrelid♣♠♣ University of Oslo, Department of Informatics

♠ Center for Advanced Study at the Norwegian Academy of Science and Letters

{emanuel |oe |liljao }@ifi.uio.no

Abstract

This paper describes Sherlock, a general-ized update to one of the top-performingsystems in the *SEM 2012 shared taskon Negation Resolution. The systemand the original negation annotations havebeen adapted to work across different seg-mentation and morpho-syntactic analysisschemes, making Sherlock suitable to studythe downstream effects of different ap-proaches to pre-processing and grammati-cal analysis on negation resolution.

1 Introduction & Motivation

Negation Resolution (NR) is the task of determin-ing, for a given sentence, which part of the lin-guistic signal is affected by a negation cue. The2012 shared task at the First Joint Conference onLexical and Computational Semantics (*SEM) is anotable effort in NR research (Morante and Blanco,2012), providing the field with a sizable human-annotated corpus for negation (the first outside thebiomedical domain), a standardized set of evalua-tion metrics, as well as empirical NR results fromeight competing teams. Our NR system, Sherlock(Lapponi et al., 2012b), ranked first and secondin the open and closed tracks, respectively. It haslater been used as a pre-processor for SentimentAnalysis (Lapponi et al., 2012a) and, due to its re-liance on dependency-based features, as a meansof evaluating different dependency representationsextrinsically (Elming et al., 2013; Ivanova et al.,2013).

These latter efforts served as an inspiration forthe 2017 shared task on Extrinsic Parser Evaluation(EPE 2017; Oepen et al., 2017). Here, participantsare invited to provide fully pre-processed and syn-tactically parsed inputs to three dowstream systemsaddressing different tasks: biological event extrac-

tion (Björne et al., 2017) and fine-grained opinionanalysis (Johansson, 2017), in addition to NR. Al-though Sherlock and the *SEM 2012 negation datahave already been used for extrinsic dependencyparsing evaluation, the novelty of the current worklies in the fact that the aforementioned earlier workassumed dependency graphs obtained over uniform,gold-standard sentence and token boundaries, asdefined by the original token-level annotations ofMorante and Daelemans (2012). In contrast, foruse of Sherlock in conjunction with a diverse rangeof parsers that each start from ‘raw’, unsegmentedtext, the NR set-up had to be generalized to allow‘projection’ of the original, token-level annotationsto variable segmentations, both during training andevaluation. In the remainder of this paper we willprovide an overview of the task of NR as defined bythe annotations in the *SEM 2012 negation data, de-scribe the process of generalizing the gold-standardnegation annotations to arbitrary character spans,summarize the generalized Sherlock pipeline, anddiscuss the EPE 2017 end-to-end results for nega-tion resolution.

2 The Conan Doyle Data

The *SEM 2012 negation data annotate a collec-tion of fiction works by Sir Arthur Conan Doyle(Morante and Daelemans, 2012), henceforth CD.The CD data is comprised of the following anno-tated stories: a training set of 3644 sentences drawnfrom The Hound of the Baskervilles, a developmentset of 787 sentences taken from Wisteria Lodge,and a held-out evaluation set of 1089 sentencesfrom The Cardboard Box and The Red Circle.

The negation annotations in these sets are com-prised of so-called negation cues (linguistic signalsof negation), which can be either full tokens (e.g.not or without) or sub-tokens (un in unfortunate orn’t in contracted negations like can’t); for each cue,

21

Page 22: IN5550 Neural Methods in Natural Language Processing [1ex ... · –IN5550– NeuralMethodsinNaturalLanguageProcessing HomeExam: TaskOverviewandKick-Off StephanOepen,LiljaØvrelid,&ErikVelldal

A Simple Neural Perspective: Fancellu et al. (2016)https://www.aclweb.org/anthology/P16-1047

Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, pages 495–504,Berlin, Germany, August 7-12, 2016. c©2016 Association for Computational Linguistics

Neural Networks For Negation Scope Detection

Federico Fancellu and Adam Lopez and Bonnie WebberSchool of Informatics

University of Edinburgh11 Crichton Street, Edinburgh

f.fancellu[at]sms.ed.ac.uk, {alopez,bonnie}[at]inf.ed.ac.uk

Abstract

Automatic negation scope detection is atask that has been tackled using differ-ent classifiers and heuristics. Most sys-tems are however 1) highly-engineered, 2)English-specific, and 3) only tested on thesame genre they were trained on. We startby addressing 1) and 2) using a neuralnetwork architecture. Results obtained ondata from the *SEM2012 shared task onnegation scope detection show that evena simple feed-forward neural network us-ing word-embedding features alone, per-forms on par with earlier classifiers, witha bi-directional LSTM outperforming allof them. We then address 3) by means ofa specially-designed synthetic test set; indoing so, we explore the problem of de-tecting the negation scope more in depthand show that performance suffers fromgenre effects and differs with the type ofnegation considered.

1 Introduction

Amongst different extra-propositional aspects ofmeaning, negation is one that has received a lotof attention in the NLP community. Previous workhave focused in particular on automatically detect-ing the scope of negation, that is, given a nega-tive instance, to identify which tokens are affectedby negation (§2). As shown in (1), only the firstclause is negated and therefore we mark he and thecar, along with the predicate was driving as insidethe scope, while leaving the other tokens outside.

(1) He was not driving the car and she left togo home.

In the BioMedical domain there is a long lineof research around the topic (e.g. Velldal et al.(2012) and Prabhakaran and Boguraev (2015)),

given the importance of recognizing negation forinformation extraction from medical records. Inmore general domains, efforts have been morelimited and most of the work centered around the*SEM2012 shared task on automatically detectingnegation (§3), despite the recent interest (e.g.machine translation (Wetzel and Bond, 2012;Fancellu and Webber, 2014; Fancellu and Webber,2015)).

The systems submitted for this shared task,although reaching good overall performance arehighly feature-engineered, with some relying onheuristics based on English (Read et al. (2012)) oron tools that are available for a limited number oflanguages (e.g. Basile et al. (2012), Packard et al.(2014)), which do not make them easily portableacross languages. Moreover, the performance ofthese systems was only assessed on data of thesame genre (stories from Conan Doyle’s SherlockHolmes) but there was no attempt to test theapproach on data of different genre.

Given these shortcomings, we investigatewhether neural network based sequence-to-sequence models (§ 4) are a valid alternative. Thefirst advantage of neural networks-based methodsfor NLP is that we could perform classificationby means of unsupervised word-embeddingsfeatures only, under the assumption that they alsoencode structural information previous systemhad to explicitly represent as features. If thisassumption holds, another advantage of contin-uous representations is that, by using a bilingualword-embedding space, we would be able totransfer the model cross-lingually, obviating theproblem of the lack of annotated data in otherlanguages.

The paper makes the following contributions:

1. Comparable or better performance: Weshow that neural networks perform on parwith previously developed classifiers, witha bi-directional LSTM outperforming them

495

22

Page 23: IN5550 Neural Methods in Natural Language Processing [1ex ... · –IN5550– NeuralMethodsinNaturalLanguageProcessing HomeExam: TaskOverviewandKick-Off StephanOepen,LiljaØvrelid,&ErikVelldal

Some (Welcome) Simplifications

Separate Sub-Problems in Negation AnalysisI Cue detection Find negation indicators (sub, single-, or multi-token);I essentially lexical disambiguation; oftentimes local, binary classification.I Scope detection Given one cue, determine sub-strings in its scope;I structural in principle, but can be approximated as sequence labeling.I Event identification within the scope, if factual, find its key ‘event’.

Candidate Ways of Dealing with Multiple Negation InstancesI Project onto same sequence of tokens: lose cue–scope correspondence;I need post-hoc way of reconstructing individual scopes for each cue.I Multiply out: create copy of full sentence for each negation instance;I at risk of presenting ‘conflicting evidence’, at least for cue detection.

23

Page 24: IN5550 Neural Methods in Natural Language Processing [1ex ... · –IN5550– NeuralMethodsinNaturalLanguageProcessing HomeExam: TaskOverviewandKick-Off StephanOepen,LiljaØvrelid,&ErikVelldal

The Architecture of Fancellu et al. (2012)

I Only consider negation scope

I multiplies out multiple instances

I ‘gold’ cue information in input

I Actually, two distinct systems:

(a) independent classification incontext of five-grams;

(b) sequence labeling (bi-RNN):binary classification as in-scope

24

Page 25: IN5550 Neural Methods in Natural Language Processing [1ex ... · –IN5550– NeuralMethodsinNaturalLanguageProcessing HomeExam: TaskOverviewandKick-Off StephanOepen,LiljaØvrelid,&ErikVelldal

Probably State of the Art: Kunz et al. (2020)https://github.uio.no/in5550/2020/blob/master/exam/negation/Kun:Oep:Kuh:20.pdf

End-to-End Negation Resolution as Graph Parsing

Robin Kurtz, Stephan Oepen, & Marco Kuhlmann

Linkoping University Department of Computer and Information ScienceUniversity of Oslo, Department of Informatics

[email protected], [email protected], [email protected]

Abstract

We present a neural end-to-end architecturefor negation resolution based on a formula-tion of the task as a graph parsing problem.Our approach allows for the straightforwardinclusion of many types of graph-structuredfeatures without the need for representation-specific heuristics. In our experiments, wespecifically gauge the usefulness of syntacticinformation for negation resolution. Despitethe conceptual simplicity of our architecture,we achieve state-of-the-art results on the Co-nan Doyle benchmark dataset, including a newtop result for our best model.

1 Introduction

Negation resolution (NR), the task of detectingnegation and determining its scope, is relevant fora large number of applications in natural languageprocessing, and has been the subject of severalcontrastive research efforts (Morante and Blanco,2012; Oepen et al., 2017; Fares et al., 2018). Inthis paper we cast NR as a graph parsing prob-lem. More specifically, we represent negation cuesand corresponding scopes as a bi-lexical graph andlearn to predict this graph from the tokens. Underthis representation, we may apply any dependencygraph parser to the task of negation resolution. Thespecific parsing architecture that we use in this pa-per extends that of Dozat and Manning (2018).

Contributions This work (a) rationally recon-structs the previous state of the art in negationresolution; (b) develops a novel approach to theproblem based on general graph parsing tech-niques; (c) proposes and evaluates different waysof integrating ‘external’ grammatical information;(d) gauges the utility of morpho-syntactic pre-processing at different levels of accuracy; (e) shiftsexperimental focus (back) to a complete, end-to-end perspective on the task; and (f) reflects on un-

certainty in judging experimental findings, includ-ing thourough significance testing.

Paper Structure In the following Section 2, wereview selected related work on negation resolution.Section 3 describes the specific NR task that weaddress in this paper. In Section 4 we present ournew encoding of negations and our parsing model,followed by the description of our experiments andresults in Section 5. We discuss these results inSection 6 and summarize our findings in Section 7.

2 Related Work

While there exist a variety of datasets that anno-tate negation (Jimenez-Zafra et al., 2020), the Bio-Scope (Szarvas et al., 2008) and Conan Doyledatasets (ConanDoyle-neg; Morante and Daele-mans, 2012) are most commonly used for evalua-tion. The latter was created for the shared task at*SEM 2012 (Morante and Blanco, 2012), wherecompeting systems needed to predict both nega-tion cues (linguistic expressions of negation) andtheir corresponding scopes, i.e. the part of the utter-ance being negated. Cues can be simple negationmarkers (such as not or without), but may also con-sist of multiple words (i.e. neither . . . nor), or bemere affixes (i.e. infrequent or clueless). In contrastto other datasets, ConanDoyle-neg also annotatesnegated events that are part of the scopes.

The analysis of negation is divided into two re-lated sub-tasks, cue detection and scope resolu-tion. While cue detection is mostly dependent onlexical or morphological features, relating cues toscopes is a structured prediction problem and willlikely benefit from an analysis of morpho-syntac-tic or surface-semantic properties. The UiO2 sys-tem SHERLOCK (Lapponi et al., 2012), the win-ner of the open track of the *SEM 2012 sharedtask, uses morpho-syntactic parts of speech andsyntactic dependencies to classify tokens as either

25

Page 26: IN5550 Neural Methods in Natural Language Processing [1ex ... · –IN5550– NeuralMethodsinNaturalLanguageProcessing HomeExam: TaskOverviewandKick-Off StephanOepen,LiljaØvrelid,&ErikVelldal

Negation at WNNLP 2020: Our Starting Package

Data and Support SoftwareI Four Sherlock Holmes stories, annotated with ‘gold’ cues and scopes;I easy-to-read JSON serialization; support software to read and write;I Python interface to standard *SEM 2012 scorer (common metrics);I PoS tags (and syntactic dependency) trees from various parsers.

Possible Research AvenuesI Replicate basic (biLSTM) architecture of Fancellu et al. (2017);I try out more elaborate labeling schemes (e.g. Lapponi et al., 2017);I investigate relevance of different PoS tags at different accuracy levels;I determine contributions of pre-trained contextualized embeddings;I actual structured prediction: maximize on whole sequence (e.g. CRF);I ...

26

Page 27: IN5550 Neural Methods in Natural Language Processing [1ex ... · –IN5550– NeuralMethodsinNaturalLanguageProcessing HomeExam: TaskOverviewandKick-Off StephanOepen,LiljaØvrelid,&ErikVelldal

What is sentiment analysis?I Identifying evaluative expressions in text, and

I measuring positive/negative polarity.

I Different granularities: Document-level, sentence-level, phrase-level

Use casesI News- and media monitoring,I analysing public opinion,I market analytics, and more

27

Page 28: IN5550 Neural Methods in Natural Language Processing [1ex ... · –IN5550– NeuralMethodsinNaturalLanguageProcessing HomeExam: TaskOverviewandKick-Off StephanOepen,LiljaØvrelid,&ErikVelldal

Targeted Sentiment Analysis (SA)

I Fine-grained sentiment analysis at the sub-sentence levelI what is the target of sentiment?I what is the polarity of sentiment directed at the target?

1. Denne diskenP OS er svært stillegående‘This disk runs very quietly’

28

Page 29: IN5550 Neural Methods in Natural Language Processing [1ex ... · –IN5550– NeuralMethodsinNaturalLanguageProcessing HomeExam: TaskOverviewandKick-Off StephanOepen,LiljaØvrelid,&ErikVelldal

The Norwegian Review Corpus (NoReC)

29

Page 30: IN5550 Neural Methods in Natural Language Processing [1ex ... · –IN5550– NeuralMethodsinNaturalLanguageProcessing HomeExam: TaskOverviewandKick-Off StephanOepen,LiljaØvrelid,&ErikVelldal

NoReCfine

I Newly released dataset for fine-grained SA of NorwegianI https://github.com/ltgoslo/norec_fine

# Examples

Train Dev. Test Total Avg. len.

Sents. 6145 1184 930 8259 16.8Targets 4458 832 709 5999 2.0

Table: Number of sentences and annotated targets across the data splits.

30

Page 31: IN5550 Neural Methods in Natural Language Processing [1ex ... · –IN5550– NeuralMethodsinNaturalLanguageProcessing HomeExam: TaskOverviewandKick-Off StephanOepen,LiljaØvrelid,&ErikVelldal

NoReCfine

31

Page 32: IN5550 Neural Methods in Natural Language Processing [1ex ... · –IN5550– NeuralMethodsinNaturalLanguageProcessing HomeExam: TaskOverviewandKick-Off StephanOepen,LiljaØvrelid,&ErikVelldal

Task specificsI Data format: BIO (target + polarity)

# sent_id = 501595-13-04Munken B-targ-PositiveBistro I-targ-Positiveer Oen Ohyggelig Onabolagsrestaurant Ofor Ohverdagslige Oog Ouformelle Oanledninger O. O

I Baseline system: PyTorch pre-code for BiLSTMI Evaluation code

https://github.uio.no/in5550/2020/tree/master/exam/targeted_sa

32

Page 33: IN5550 Neural Methods in Natural Language Processing [1ex ... · –IN5550– NeuralMethodsinNaturalLanguageProcessing HomeExam: TaskOverviewandKick-Off StephanOepen,LiljaØvrelid,&ErikVelldal

Possible directions

1. Experiment with alternative label encoding (e.g. BIOUL)2. Compare pipeline vs. joint prediction approaches.3. Impact of different architectures:

I LSTM vs. GRU vs. TransformerI Include character-level informationI Depth of model (2-layer, 3-layer, etc)

4. Effect of using different pretrained word embeddings5. Effect of using pretrained models (ELMo, BERT)6. Hyperparameter tuning

33