Syntopical Graphs for Computational Argumentation Tasks

Proceedings of the 59th Annual Meeting of the Association for Computational Linguisticsand the 11th International Joint Conference on Natural Language Processing, pages 1583–1595

August 1–6, 2021. ©2021 Association for Computational Linguistics

1583

Syntopical Graphs for Computational Argumentation Tasks

Joe Barrow∗University of Maryland

[email protected]

Rajiv JainAdobe Research

[email protected]

Nedim LipkaAdobe Research

[email protected]

Franck DernoncourtAdobe Research

[email protected]

Vlad I. MorariuAdobe Research

[email protected]

Varun ManjunathaAdobe Research

[email protected]

Douglas W. OardUniversity of [email protected]

Philip ResnikUniversity of [email protected]

Henning WachsmuthPaderborn [email protected]

Abstract

Approaches to computational argumentationtasks such as stance detection and aspect de-tection have largely focused on the text of in-dividual claims, losing out on potentially valu-able context from the broader collection of text.We present a general approach to these tasksmotivated by syntopical reading, a reading pro-cess that emphasizes comparing and contrast-ing viewpoints in order to improve topic under-standing. To capture collection-level context,we introduce the syntopical graph, a data struc-ture for linking claims within a collection. Asyntopical graph is a typed multi-graph wherenodes represent claims and edges represent dif-ferent possible pairwise relationships, such asentailment, paraphrase, or support. Experi-ments applying syntopical graphs to stance de-tection and aspect detection demonstrate state-of-the-art performance in each domain, signif-icantly outperforming approaches that do notutilize collection-level information.

1 Introduction

Collections of text about the same topic such asnews articles and research reports often present avariety of viewpoints. Adler and Van Doren (1940)proposed a formalized manual process for under-standing a topic based on multiple viewpoints intheir book, How to Read a Book, applying dialec-tics to collection browsing. This process consistsof four levels of reading, the highest of which issyntopical reading. Syntopical reading is focusedon understanding a core concept by reading a col-lection of works. It requires finding passages on the

∗? Work done while interning at Adobe Research.

core concept that agree or disagree with each other,defining the issues, and analyzing the discussionto gain a better understanding of the core concept.The goal of the paper at hand is to operationalizethe syntopical reading process computationally inorder to help individuals make sense of a collectionof documents for a given topic.

Viewed through the lens of computational argu-mentation, these documents state claims or con-clusions that can be grouped by the aspects of thetopic they discuss as well as by the stance they con-vey towards the topic (Stede and Schneider, 2018).An individual aiming to form a thorough under-standing of the topic needs to get an overview ofthese viewpoints and their interactions. This maybe hard even if adequate tool support for brows-ing the collection is available (Wachsmuth et al.,2017a; Stab et al., 2018; Chen et al., 2019).

We seek to enable systems that are capableof reconstructing viewpoints within a collection,where a viewpoint is expressed as a triple V =(topic, aspect, stance). We consider the argumenta-tive unit of a claim to be the minimal expression ofa viewpoint in natural language, such that a singleviewpoint can have many claims expressing it. Asan example, consider the following two claims:

“Nuclear energy emits zero CO2.”

“Nuclear can provide a clean baseload, elimi-nating the need for fracking and coal mining.”

Within a collection these claims express:

V = (Nuclear Energy, env. impact, PRO)

The goal of the systems we envision is thus toidentify, group, and summarize the latent view-

1584

Viewpoints (topic, aspect, stance)Syntopical Graph

ConstructionPairwise judgements are used to as edges in atyped multigraph, where claims and documents arethe nodes. .

Inputs

The newly created graph is then used forstance and aspect detection, to reconstructviewpoints.

A topic andrelevant claims

extracted fromdocuments.

Nuclear Energy

C1NLI

Para.

ENTAILMENT

PARAPHRASE

C2

C3 C1

C1C2

C3C4

C3

C1C2

C4(Nuclear Energy, A2, )

C2

(Nuclear Energy, A1, )

C1

(Nuclear Energy, A1, )

C3 C4

Applications

C3

C1C2

C4+

+

--

-

-

+ ++

-

-

Figure 1: We introduce the idea of a syntopical graph, a data structure that represents the context of claims. Thegraph is a typed multi-graph (multiple edges allowed between nodes), where nodes are claims or documents, andedges are pairwise relationships such as entailment, paraphrase, topical similarity, or term similarity. By using thisgraph as input to graph neural networks or traditional graph algorithms, we can significantly improve on the tasksof aspect and stance detection, which allow us to identify viewpoints in a collection.

points underlying the claims in a collection, suchthat a reader can investigate and engage with them.

Many existing approaches attempt to identifyviewpoints within a collection largely from the textof individual claims only, which we refer to as“content-only approaches.” However, as the latentviewpoints are a global property of a collection, itis necessary to account not only for the text butalso its context. For instance, in order to identifythe stance of a claim with respect to a topic, it mayhelp to consider the claim’s stance relative to otherclaims on the topic. Although a few researchershave accounted for connections between claimsand other information (details in Section 2), nosystematic model of their interactions exists yet.

We therefore introduce a syntopical graph thatmodels pairwise textual relationships betweenclaims in order to enable a better reconstructionof the latent viewpoints in a collection. In line withthe idea of Adler and Van Doren (1940), the syn-topical graph makes the points of agreement anddisagreement within the collection explicit. Tech-nically, it denotes a multi-graph (where a pair ofnodes can have many typed edges) that simulta-neously represents relationships such as relativestance, relative specificity, or whether a claim para-phrases another. We build syntopical graphs bytransferring pretrained pairwise models, requiringno additional training data to be annotated.

We decompose the problem of viewpoint recon-struction into the subtasks of stance detection andaspect detection, and evaluate the benefits of syn-

topical graphs — which are a collection-level ap-proach — on both tasks. For stance detection,we use the sentential argumentation mining col-lection (Stab et al., 2018) and the IBM claim stancedataset (Bar-Haim et al., 2017a). For aspect detec-tion we use the argument frames collection (Ajjouret al., 2019). We treat the graph as an input to: (a) agraph neural network architecture for stance de-tection, and (b) graph algorithms for unsupervisedtasks such as aspect clustering. In both settings,our results show that the syntopical graph approachimproves significantly over content-only baselines.

The contributions of the work are two-fold:1. A well-motivated data structure for capturing

the latent structure of an argumentative corpus,the syntopical graph.

2. An instantiation of syntopical graphs thatyields state-of-the-art results on stance detec-tion and aspect detection.

2 Related Work

First attempts at stance detection used content-oriented features (Somasundaran and Wiebe, 2009).Later approaches, such as those by Ranade et al.(2013) and Hasan and Ng (2013), exploited com-mon patterns in dialogic structure to improve stancedetection. More tailored to argumentation, Bar-Haim et al. (2017a) first identified the aspects of adiscussed topic in two related claims and the senti-ment towards these aspects. From this information,they derived stance based on the contrastiveness ofthe aspects. Later, Bar-Haim et al. (2017b) mod-

1585

eled the context of a claim to account for caseswithout sentiment. Our work follows up on andgeneralizes this idea, systematically incorporatingimplicit and explicit structure induced by the topics,aspects, claims, and participants in a debate.

In a similar vein, Li et al. (2018) embedded de-bate posts and authors jointly based on their inter-actions, in order to classify a post’s stance towardsthe debate topic. Durmus et al. (2019) encodedrelated pairs of claims using BERT to predict thestance and specificity of any claim in a complexstructure of online debates. However, neither ofthese exploited the full graph structure resultingfrom all the relations and interactions in a debate,which is the gap we fill in this paper. Sridhar et al.(2015) model collective information about debateposts, authors, and their agreement and disagree-ment using probabilistic soft logic. Whereas theyare restricted to the structure available in a forum,our approach can in principle be applied to arbitrarycollections of text.

We also tackle aspect detection, which may atfirst seem more content-oriented in nature. Accord-ingly, previous research such as the works of Misraet al. (2015) and Reimers et al. (2019b) employedword-based features or contextualized word embed-dings for topic-specific aspect clustering. Ajjouret al. (2019), whose argument frames dataset weuse, instead clustered aspects with Latent Seman-tic Analysis (LSA) and topic modeling. But, ingeneral, aspects might not be mentioned in a textexplicitly. Therefore, we follow these other ap-proaches, treating the task as a clustering problem.Unlike them, however, we do not model only thecontent and linguistic structure of texts, but wecombine them with the debate structure.

Different types of argumentation graphs havebeen proposed, covering expert-stance informa-tion (Toledo-Ronen et al., 2016), basic argumentand debate structure (Peldszus and Stede, 2015;Gemechu and Reed, 2019), specific effect rela-tions (Al-Khatib et al., 2020; Kobbe et al., 2020),social media graphs (Aldayel and Magdy, 2019),and knowledge graphs (Zhang et al., 2020). Ourmain focus is not learning to construct ground-truthgraphs, but how to use an approximated graph to de-rive properties such as stance and aspect. Our workresembles approaches that derive the relevance ofarguments (Wachsmuth et al., 2017b) or their cen-trality and divisiveness in a discussion (Lawrenceand Reed, 2017) from respective graphs. Sawhney

et al. (2020) used a neural graph attention networkto classify speech stance based on a graph withtexts, speakers, and topics as nodes. While we alsouse a relational graph convolutional network forlearning, the graph we propose captures implicitclaim relations as well as explicit structure.

In addition, text-based graph neural models havebeen proposed to facilitate classification, such asTextGCN (Yao et al., 2019) as well as the follow-up work BertGCN (Lin et al., 2021). These ap-proaches build a graph over terms (using normal-ized mutual information for edge weights) as wellas sentences and documents (using TF-IDF foredge weights) to improve sentence- or document-level classification. Our work generalizes this ap-proach, focusing on incorporating many edge typeswith different meanings, such as relative stance orrelative specificity. We compare our approach witha BertGCN baseline, and we ablate all considerededge types, in order to show the importance ofcapturing these different textual relationships.

Ultimately, we seek to facilitate understandingof the main viewpoints in a text collection. Qiuand Jiang (2013) used clustering-based viewpointdiscovery to study the impact of the interaction oftopics and users in forum discussions. Egan et al.(2016) used multi-document summarization tech-niques to mine and organize the main points in adebate, and Vilares and He (2017) mined the maintopics and their aspects using a Bayesian model.Bar-Haim et al. (2020) introduced the idea of key-point analysis, grouping arguments found in a col-lection by the viewpoint they reflect and summa-rizing each group to a salient keypoint. While ourgraph-based analysis is likely to be suitable for find-ing keypoints, we instead focus on reconstructinglatent viewpoints by grouping claims, leaving openthe option to identify the key claims in future workas it would require manual evaluation.

3 Syntopical Graphs

We now introduce the concept of a syntopicalgraph. The goal of our syntopical graph is to sys-tematically model the salient interactions of allclaims in a collection of documents. Then, proper-ties of claims (say, their stance towards a topic orthe aspects they cover) can be assessed based notonly on the content of the claim alone, but on theentirety of information available in their context.

To capture this context, we build a graph wheredocuments and claims are nodes. Edges between

1586

Claim: Nuclear energy emits zero CO2.

Topic: Nuclear Energy

Claim: Nuclear can provide a cleanbaseload and eliminate the need forfracking and coal mining.


Claim: However, uranium mining ishardly a clean process.


SUPPORT

MORE SPEC.

REFUTE

REFUTE

SUPPORT

REFUTE

CONTAINS

CONTAINSCONTAIN

S

Figure 2: An example syntopical graph created from a collection of documents on the topic of Nuclear Energy.The nodes are documents and claims, and there are 0+ weighted and typed edges between any pair of nodes. Indownstream applications, we add the representation of the topic to the claim nodes.

claims are constructed using pairwise scoring func-tions, such as pretrained natural language inference(NLI) models. Claims may relate to each other inmany different ways: they can support or refuteeach other, they can paraphrase each other, theycan entail or contradict each other, they can be topi-cally similar, etc. We hypothesize that being able toaccount for these relationships helps computationalargumentation tasks such as stance detection.

3.1 Graph Components

Intuitively, if it is known that claim (a) refutes claim(b), and claim (b) has a positive stance to the topic,it seems more reasonable to believe that claim (a)has a negative stance. We can represent all of thiswith a graph if we allow multiple edges betweennodes. For instance, claims can have edges that la-bel both relative agreement and relative specificity,as exemplified in the graph in Figure 2. The processof constructing a graph is shown in Figure 1.

Technically, we capture this intuition as a typedmulti-graph: typed in that the nodes have differ-ent types drawn from {document, claim}, and amulti-graph because multiple edges (of differenttypes) are allowed between nodes. We then for-mally define a syntopical graph as a labeled multi-graph in terms of a 5-tuple G:

G = (ΣN ,ΣE , N,E, lN , lE),

where ΣN is the alphabet of node types, ΣE is thealphabet of edge types, N is the set of nodes, Eis the set of multi-edges, lN : N → ΣN mapseach node to its type, and lE : E → ΣE mapseach edge to its type. In the following, we showhow to construct the graph and what each of its

components look like.The node types, ΣN , are used to represent struc-

tured metadata in the graph:

ΣN = {claim, document}Each node in the graph is mapped to its type with

the function lN . Accordingly, the edge alphabet is

ΣE = ΣE:claim ∪ ΣE:document,

where ΣE:claim is the set of types of claim-claimedges and ΣE:document is the set of types of claim-document edges.

Claim Nodes The central node type in a syntopi-cal graph is a claim node. A claim node representsa topically relevant claim in a collection. By treat-ing a claim as a node embedded in a graph, we cantake advantage of rich graph structures to repre-sent the context in which the claim occurs, such asthe document the claim appears in or the claim’srelationship with other claims.

Document Nodes In general, two claims fromthe same source are more likely to represent thesame viewpoint than a pair of claims sampled ran-domly. To capture this intuition, we allow claimsfrom the same source to share information witheach other via document nodes, which enables mod-els to pool information about groups of claims andshare the information amongst them. Similar in-formation about claims can be aggregated in themetadata node and broadcast out to all claims.

Pairwise Relationships as Multi-Edges Thereare two classes of edge types:

• claim-claim edges (ΣE:claim) model the re-lationship between pairs of claims: do they

1587

support each other, is one more specific thanthe other, etc. Different tasks can make use ofthis information (e.g., a claim is likely to havea specific stance if other claims that support ithave the same stance).• claim-document edges (ΣE:document) allow

groups of claims to share information witheach other through common ancestors (e.g.,claims in a document pro nuclear energy aresomewhat likely to have a pro stance).

Any pair of nodes can have multiple edges ofdifferent types between them; a claim can bothcontradict and refute another claim, for instance.

Edge Weights An edge can have a real-valuedweight associated with it on the range (−1, 1), rep-resenting the strength of the connection. The rel-ative stance edge between a claim which stronglyrefutes another would receive a weight close to −1.

3.2 Graph ConstructionFor graph edges, we combine four pretrained mod-els and two similarity measures. The pretrainededge types are: relative stance and relative speci-ficity from Durmus et al. (2019), paraphrase edgesfrom Dolan et al. (2004); Morris et al. (2020), andnatural language inference edges from Williamset al. (2018); Liu et al. (2019). The edge weightsare the confidence scores defined by

weight(u, v, r) = ppos(u,v) − pneg(u,v),

where u and v are claims, r is the relation type,and ppos(u,v) is the probability of a positive asso-ciation between the claims (e.g., “is a paraphrase”or “does entail”), pneg(u,v) for a negative one. Forsimilarity-based edges, we use standard TF-IDFfor term-based similarity and LDA for topic-basedsimilarity (Blei et al., 2003), using cosine similar-ity as the edge weight. The document-claim edgeshave a single type, contains, with an edge weight of1. We compute each of the pairwise relationshipsfor all pairs of claims that share the same topic,and then filter out edges using a threshold τ on theabsolute value of the edge weight. τ is tuned asa hyperparameter on a validation dataset for eachtask.

For node representations, we initialize the claimnode representations with the output of a naturallanguage inference model that predicts whether theclaim entails the topic. We initialize the documentrepresentations with a sentence vectorizer over thetext of the document.

4 Viewpoint Reconstruction

A viewpoint can be understood as a judgment ofsome aspect of a topic that conveys a stance towardsthe topic. The goal of viewpoint reconstruction is toidentify the set of viewpoints in a collection givena topic, starting with the claims. An example ofthis process is shown on the right in Figure 1. Todenote viewpoints, we borrow notation in line withthe idea of aspect-based argument mining (Traut-mann, 2020), which in turn was inspired by aspect-based sentiment analysis. In particular, we expressa viewpoint as a triple V :

V = (topic, aspect, stance)

A claim is an expression of a viewpoint in nat-ural language, and a single viewpoint can be ex-pressed in several ways throughout a collection inmany claims. Aspects are facets of the broader ar-gument around the topic. While some actual claimsmay encode multiple viewpoints simultaneously,henceforth we consider each claim to encode oneviewpoint for simplicity. To tackle viewpoint re-construction computationally, we decompose it intotwo sub-tasks, stance detection and aspect detec-tion, along with a final grouping of claims withsame aspect and stance.

Stance Detection Stance detection requires as-signing a valence label to a claim with respect toa particular topic. Though content-only baselinescan work in many cases, there are also cases wherethe stance of a claim might only make sense in rela-tion to a broader argument. For example, the claim“Nuclear power plants take 5 years to construct” isdifficult to assign a stance a priori. However, in thecontext of other claims such as “Solar farms oftentake less than 2 years to commission”, it might beviewed as having a negative stance. To exploit thisadditional contextual information, we use syntopi-cal graphs as input to a graph neural network, inparticular a Relational Graph Convolutional Net-work (R-GCN) (Schlichtkrull et al., 2018).

We treat stance detection as a supervised nodeclassification task. The goal is to output a predic-tion in the set {PRO, CON} for each claim noderelative to a topic. R-GCNs were developed toperform node classification and edge predictionfor knowledge bases, which are also typed multi-graphs. As such, the abstractions of the syntopicalgraph slot neatly into the abstractions of R-GCNs.

The input to an R-GCN is a weighted, typedmultigraph with some initial node representation.

1588

The network is made up of stacked relational graphconvolutional layers; each layer computes a newset of node representations based on each node’sneighborhood. In effect, each layer combines theedge-type-specific representation of all of a node’sneighbors with its own representation. The repre-sentations are influenced by the node, and all ofits neighbors, attenuated through the edge weight.An R-GCN thus consumes a set of initial claimrepresentations, transforms them through stacks ofrelational graph convolutional layers, and outputsa final set of node vectors, which are fed into aclassifier to predict the claim stance.

Aspect Detection Following the work of Ajjouret al. (2019), we treat aspect detection as an unsu-pervised task. As aspects are an open class, we usea community detection approach, modularity-basedcommunity detection (Clauset et al., 2004). Thekey intuition of modularity-based community de-tection is that communities are graph partitions thathave more edges within communities than acrosscommunities. Modularity is a value assigned toa graph partition, which is higher when there arefewer edges across communities than within them;a modularity of 0 represents a random partition,while higher modularities indicate tighter commu-nities. The goal of modularity-based communitydetection is to maximize modularity by findingdense partitions. This intuition works well for as-pects in a syntopical graph — claims that discuss asimilar aspect are likely to have salient interactions.

As aspects themselves are independent of stance,the direction of the interactions (e.g., support orrefute) does not matter, but their salience does. Tocapture only the intensity of the interaction betweentwo claims, we apply a transformation to signedcollapse the multi-edges of a syntopical graph (de-noted SG) to a positive-weighted graph (G):

wG(u, v) =

∑t∈ΣE

δSG(u, v, t) · |wSG(u, v, t)|∑t∈ΣE

δSG(u, v, t),

where wG(u, v) is the weight between nodes u andv in the new graph G, δSG(u, v, t) = 1 if an edgeof type t exists between nodes u and v in the syn-topical graph (SG), and wSG(u, v, t) is the edgeweight for type t between nodes u and v in thesyntopical graph. This is equivalent to taking theaverage across types of the absolute values of theweights. The newly constructed single-edge graphis then used to identify aspects, which should havemore interactions between them than across them.

5 Experiments

To evaluate the effectiveness of our approach at re-constructing viewpoints, we consider three datasetsacross the two subtasks of stance and aspect de-tection. We hypothesize that syntopical graph ap-proaches will outperform content-only baselines —including the ones used to initialize the claim repre-sentations — because they are able to make use ofnot only the claim content, but also the claim con-text. We further hypothesize that syntopical graphapproaches will outperform graph-based baselinesthat use only textual similarity edges, because thelatter’s claim context is not as rich. For our experi-ments, we construct a syntopical graph as describedin Section 3.

We further evaluate our model by conductingseveral additional experiments, including removingthe use of document nodes or initial claim repre-sentations, analyzing the performance of each edgetype in isolation and when left out, and an anal-ysis of the differences in predictions between thesyntopical graph and the content-only baselines.

Stance Detection For the stance detection exper-iments, we use two datasets: first, the heteroge-neous cross-topic argumentation mining dataset(ArgMin) from Stab et al. (2018), and second, theclaim-stance dataset (IBMCS) from Bar-Haim et al.(2017a). The ArgMin dataset contains about 25ksentences from 400 documents across eight con-troversial topics, ranging from abortion to schooluniforms. Following Schiller et al. (2020), we fil-ter only the claims, resulting in 11.1k claims. TheIBMCS dataset contains 2.4k claims across 55 top-ics. We use the splits from Schiller et al. (2020),which ensure that the topics in the training and testsets are mutually exclusive. Claims are given astance label drawn from {PRO, CON}. We evaluateusing macro-averaged F1 and accuracy.

We use a syntopical graph for each dataset as theinput to a relational graph convolutional network(R-GCN), implemented in DGL (Wang et al., 2019)and PyTorch (Paszke et al., 2019). For documentnode representations, we use a pretrained sentencetransformer and concatenate all of the sentencesas input (Reimers et al., 2019a). For the claimnode representations, we use a RoBERTa modelpretrained on an NLI task (Liu et al., 2019) to en-code both the claim and topic; the resulting vectorsare fixed throughout training.

1589

IBMCS ArgMinModel macro F1 Acc macro F1 Acc

Majority Baseline 34.06 51.66 33.83 51.14RoBERTa Large NLI 52.34 52.69 60.56 60.93BertGCN (Lin et al., 2021) 66.16 66.26 58.51 58.73MT-DNN, 1 Dataset (Schiller et al., 2020)* 70.66 71.16 61.65 62.40MT-DNN, 10 Datasets (Schiller et al., 2020)* 77.72 77.87 61.38 62.11

Syntopical Graph (R-GCN, Structure Only) 44.32 47.82 42.59 52.71Syntopical Graph (R-GCN, No Documents) 83.03 83.10 67.52 68.34Syntopical Graph (R-GCN) 83.40 83.54 67.77 68.01

Table 1: Results on the two stance detection datasets. The full syntopical graph, as well as the variant withoutdocument nodes, outperforms the content only baselines by both a significant and substantial margin (p < 10−7

for ArgMin, and p < 10−4 for IBMCS). A * on the model means we retrained a previously reported baseline.

Model b-cubed F1 b-cubed P b-cubed R

LDA 47.01 47.19 49.82Clustering (RoBERTa Large MNLI) 45.69 44.76 50.15Syntopical Graph (Modularity) 55.42 66.11 53.82

Table 2: Aspect detection results on the argument frames dataset (Ajjour et al., 2019). The syntopical graphoutperformed both LDA and clustering of RoBERTa embeddings, recovering latent aspects substantially betterthan either approach. The syntopical graph approach significantly outperforms LDA (p < 10−19).

Aspect Detection For clustering-based aspectdetection, we use the argument frames datasetfrom Ajjour et al. (2019). The dataset containsroughly 11k sentences drawn from 465 differenttopics. Each sentence has a specific aspect (orframe, in the original paper), drawn from a set ofover a thousand possible aspects. Following theauthors, we evaluate with a clustering metric, b-cubed F1 (Amigo et al., 2009). We transform thegraph as described in Section 4 to use as an inputto modularity-based community detection, using τof 0.6 tuned on held-out topics.

6 Results and Analysis

The main results for stance detection are shownin Table 1. The most important finding is thatthe fusion of signals from content and from struc-ture done by our approach syntopical graph (R-GCN) outperforms the existing state-of-the-art(Schiller et al., 2020) for both the IBMCS dataset(83.40 macro F1, +5.68 absolute) and the ArgMindataset (67.7 macro F1, +6.12 absolute). Thecontent-oriented RoBERTa Large NLI model andthe structure-only syntopical graph have signifi-cantly reduced performance independently, empha-sizing the complementarity of the two signals. Ourbest network is the one which includes both claimand document node, except for the ArgMin dataset.

Aspect detection results are shown in Table 2.Our modularity approach outperforms the state-of-

the-art (Ajjour et al., 2019) on the argument framesdataset (55.42 b-cubed F1, +8.41 absolute).

The remainder of this section investigates the ro-bustness of the syntopical graph approach to stanceand aspect detection: First, we analyze the con-tribution of each edge type, running experimentswithout and with only each edge type. We alsoexamine the accuracy of the edges in our graphwhen applied out of domain as well as analysis tounderstand the types of claims for which this modelimproves performance.

Edge Analysis We conducted an ablation studyto analyze the usefulness of each considered edgetype. To do so, we built graphs containing eachedge independently, and graphs dropping each edgeindependently. Table 3 presents the results.

For the supervised task of stance detection, weuse the IBMCS dataset. No single edge performsas well as the combination of edges, the best beingrelative stance with a macro-F1 score of 80.72.This indicates that our model is capable of takingadvantage of the different kinds of relationshipsrepresented by the edge types. We see the largestperformance drops when we remove relative stance(79.39), relative specificity (79.39), or NLI (78.95)edges respectively, indicating the highest amount ofunique information being captured by these edges.In contrast, paraphrase can be removed withoutloss for stance detection according to the results.

1590

Edge Model Stance Detection (macro F1) Aspect DetectionAlone Without Alone Without

Relative Stance RoBERTa Base 80.72 79.39 52.22 53.52Relative Specificity RoBERTa Base 70.22 79.35 43.59 55.73Paraphrase RoBERTa Large 75.57 83.42 56.31 53.77NLI RoBERTa Base 80.29 78.95 53.16 53.52Term Similarity TF-IDF + cosine 73.62 81.83 52.40 54.74Topic Similarity LDA + cosine 72.67 82.54 51.11 54.92

All Edges 83.40 55.42

Table 3: Importance of each edge type for both evaluated tasks. We examine each edge type alone and wheneliminated from the graph entirely (without). For supervised stance detection, no single edge performs as well asthe combination of all edges. For unsupervised aspect detection paraphrase edges provide the best signal.

Edge Accuracy(All RoBERTa) Top Bottom Random

Stance 53% 44% 52%Specificity 82% 52% 56%Paraphrase 93% 65% 74%NLI 93% 50% 60%

Table 4: Performance of each edge type across domainsconsidering the 100 strongest edges, the 100 weakestedges, and 100 random edges. There is a clear trendof the strongest edges being more accurate and theweakest edges being less accurate, meaning that theedge weight does have some predictive effect about theedge’s accuracy.

This is opposite for aspect detection, whichwe treat as an unsupervised community detectiontask; here paraphrase alone outperforms the graphwith all edge relationships (macro F1 56.31 versus55.42). The other edges even have a slight negativeeffect on the overall results (55.42); being unsuper-vised, our approach here has no way of filtering outuninformative edges.

Edge Domain Transfer One possible con-founder of the contribution of each edge type is theout-of-domain performance of the pairwise modelused to predict that edge. A poor model wouldprovide little more than random noise, even if theedge type were expected to be helpful. To investi-gate this possibility, we sampled 100 each of theedges (above τ = 0.6) with the highest weight, thelowest weight, and a random sample. We then an-notated each edge as being correctly or incorrectlypredicted. Results are shown in Table 4.

There is a clear trend that the edge weight iscorrelated with edge correctness, meaning that themodels retain some level of calibration across do-mains. As we incorporate the edge weight in theR-GCN, this helps to lessen the effect of the noisier,

weaker edges. Another trend is that an edge type’susefulness across tasks is not solely a function ofthat edge type’s accuracy. The type of failure modeis also important. For instance, the relative stanceedges have poor surface-level accuracy, but themost common failure was not predicting the wrongrelative stance; it was predicting any stance forpairs of claims about different aspects.

Flip Analysis Finally, we analyze ”flipped”cases in stance detection in which the baseline pre-dicted stance incorrectly but the model predictedstance correctly, or vice-versa, to understand areasfor which this model improves performance. Asample of these is shown in Table 5.

Perhaps the most surprising result is how differ-ent the predictions of the syntopical graph-basedapproach are from those of the content-only MT-DNN baseline. For the IBMCS dataset, there were1355 claims in the test set, and we flipped 219(16.2%) correctly relative to the MT-DNN base-line, but also 140 (10.3%) incorrectly compared tothat baseline. Thus, we flipped 26.5% of the over-all predictions for the 5.68 point improvement inF1. This holds across the ArgMin dataset as well,where we flipped 536 (19.6%) claims correctly and373 (13.7%) claims incorrectly, out of a total 2726claims in the test set. Though we show substan-tial gains overall, it seems that the models capturedifferent signals. We thus believe that future im-provements through improved model combinationmay still be possible.

7 Conclusion

In this paper, we have introduced a data structure,the syntopical graph, which provides context forclaims in collections. We have provided empiricalevidence that syntopical graphs can be used as in-put representations for graph-structured approaches

1591

Example True MT-DNN Syn. Gr. Reason

Topic: wind power should be a primary focus of future energy supplyClaim: predictability of wind plant output remains lowStrongest Neighbor (+): the non-dispatchable nature of wind energyproduction can raise costs

CON PRO CON Good Neighbors

Topic: wind power should be a primary focus of future energy supplyClaim: Wind power uses little landStrongest Neighbor (-): wind power ”cannot be relied upon to providesignificant levels of power

PRO PRO CON Bad Neighbors

Topic: build the Keystone XL pipelineClaim: the pipeline would be ”game over for the planetStrongest Neighbor (-): this is the most technologically advanced andsafest pipeline ever proposed

CON PRO CON Good Neighbors

Table 5: Stance detection examples with the true stance label where the output label of our syntopical graph wasdifferent from that of the MT-DNN baseline, along with a potential reason.

(such as graph neural networks and graph cluster-ing algorithms) to obtain significant improvementsover content-only baselines.

We believe there are several opportunities to ex-tend this work in the future. First, we believe thegraph construction could be improved by avoid-ing the inefficient pairwise analysis, expanding theedge types, and utilizing a more robust classifierfor the graph. Second, we would relax the con-straint that a claim represents a single viewpoint, orthe limitation of aspect detection to unsupervisedapproaches. Finally, we would like to apply ourapproach to the original problem first motivatedby syntopical reading to see if this system can aidusers in browsing or understanding a collection.

8 Ethics Impact Statement

We anticipate that the syntopical graph exploredin this work will have a beneficial impact in realworld systems to aid users in improved comprehen-sion and reduce susceptibility to misinformation.The goal of our work is motivated by syntopicalreading, which theorizes that individuals exposedto agreement and disagreement within a collectiongain a deeper understanding of the central topics.Our work on syntopical graphs provides an algo-rithmic foundation to aid readers in understandingthe key viewpoints (aspect and stance for a giventopic) present in a collection.

Acknowledgments

We would like to thank many others for their invalu-able feedback and patient discussions, includingCharlotte Ellison, Ani Nenkova, Tong Sun, Han-Chin Shing, and Pedro Rodriguez. This work wasgenerously supported through Adobe Gift Funding,

which supports an Adobe Research-University ofMaryland collaboration. It was completed whilethe primary author was interning at Adobe Re-search.

ReferencesMortimer J Adler and Charles Van Doren. 1940. How

To Read A Book. Simon and Schuster, New York.

Yamen Ajjour, Milad Alshomary, Henning Wachsmuth,and Benno Stein. 2019. Modeling frames in ar-gumentation. In Proceedings of the 2019 Confer-ence on Empirical Methods in Natural LanguageProcessing and the 9th International Joint Confer-ence on Natural Language Processing (EMNLP-IJCNLP), pages 2922–2932, Hong Kong, China. As-sociation for Computational Linguistics.

Khalid Al-Khatib, Yufang Hou, Henning Wachsmuth,Charles Jochim, Francesca Bonin, and Benno Stein.2020. End-to-end argumentation knowledge graphconstruction. In Proceedings of the Thirty-FourthAAAI Conference on Artificial Intelligence, pages7367–7374. AAAI.

Abeer Aldayel and Walid Magdy. 2019. Your stanceis exposed! analysing possible factors for stance de-tection on social media. Proceedings of the ACM onHuman-Computer Interaction, 3(CSCW):1–20.

Enrique Amigo, Julio Gonzalo, Javier Artiles, andFelisa Verdejo. 2009. A comparison of extrinsicclustering evaluation metrics based on formal con-straints. Information retrieval, 12(4):461–486.

Roy Bar-Haim, Indrajit Bhattacharya, Francesco Din-uzzo, Amrita Saha, and Noam Slonim. 2017a.Stance classification of context-dependent claims.In Proceedings of the 15th Conference of the Euro-pean Chapter of the Association for ComputationalLinguistics: Volume 1, Long Papers, pages 251–261,Valencia, Spain. Association for Computational Lin-guistics.

https://doi.org/10.18653/v1/D19-1290

https://doi.org/10.18653/v1/D19-1290

https://www.aclweb.org/anthology/E17-1024

1592

Roy Bar-Haim, Lilach Edelstein, Charles Jochim, andNoam Slonim. 2017b. Improving claim stance clas-sification with lexical knowledge expansion and con-text utilization. In Proceedings of the 4th Workshopon Argument Mining, pages 32–38, Copenhagen,Denmark. Association for Computational Linguis-tics.

Roy Bar-Haim, Yoav Kantor, Lilach Eden, Roni Fried-man, Dan Lahav, and Noam Slonim. 2020. Quanti-tative argument summarization and beyond: Cross-domain key point analysis. In Proceedings of the2020 Conference on Empirical Methods in NaturalLanguage Processing (EMNLP), pages 39–49, On-line. Association for Computational Linguistics.

David M Blei, Andrew Y Ng, and Michael I Jordan.2003. Latent dirichlet allocation. the Journal of ma-chine Learning research, 3:993–1022.

Sihao Chen, Daniel Khashabi, Chris Callison-Burch,and Dan Roth. 2019. PerspectroScope: A windowto the world of diverse perspectives. In Proceed-ings of the 57th Annual Meeting of the Associationfor Computational Linguistics: System Demonstra-tions, pages 129–134, Florence, Italy. Associationfor Computational Linguistics.

Aaron Clauset, Mark EJ Newman, and CristopherMoore. 2004. Finding community structure in verylarge networks. Physical review E, 70(6):066111.

Bill Dolan, Chris Quirk, and Chris Brockett. 2004.Unsupervised construction of large paraphrase cor-pora: Exploiting massively parallel news sources.In COLING 2004: Proceedings of the 20th Inter-national Conference on Computational Linguistics,pages 350–356, Geneva, Switzerland. COLING.

Esin Durmus, Faisal Ladhak, and Claire Cardie. 2019.Determining relative argument specificity and stancefor complex argumentative structures. In Proceed-ings of the 57th Annual Meeting of the Associationfor Computational Linguistics, pages 4630–4641,Florence, Italy. Association for Computational Lin-guistics.

Charlie Egan, Advaith Siddharthan, and Adam Wyner.2016. Summarising the points made in online po-litical debates. In Proceedings of the Third Work-shop on Argument Mining (ArgMining2016), pages134–143, Berlin, Germany. Association for Compu-tational Linguistics.

Debela Gemechu and Chris Reed. 2019. Decompo-sitional argument mining: A general purpose ap-proach for argument graph construction. In Proceed-ings of the 57th Annual Meeting of the Associationfor Computational Linguistics, pages 516–526, Flo-rence, Italy. Association for Computational Linguis-tics.

Kazi Saidul Hasan and Vincent Ng. 2013. Stanceclassification of ideological debates: Data, models,features, and constraints. In Proceedings of the

Sixth International Joint Conference on Natural Lan-guage Processing, pages 1348–1356, Nagoya, Japan.Asian Federation of Natural Language Processing.

Jonathan Kobbe, Ioana Hulpus, , and Heiner Stucken-schmidt. 2020. Unsupervised stance detection forarguments from consequences. In Proceedings ofthe 2020 Conference on Empirical Methods in Natu-ral Language Processing (EMNLP), pages 50–60.

John Lawrence and Chris Reed. 2017. Using complexargumentative interactions to reconstruct the argu-mentative structure of large-scale debates. In Pro-ceedings of the 4th Workshop on Argument Mining,pages 108–117, Copenhagen, Denmark. Associationfor Computational Linguistics.

Chang Li, Aldo Porco, and Dan Goldwasser. 2018.Structured representation learning for online debatestance prediction. In Proceedings of the 27th Inter-national Conference on Computational Linguistics,pages 3728–3739, Santa Fe, New Mexico, USA. As-sociation for Computational Linguistics.

Yuxiao Lin, Yuxian Meng, Xiaofei Sun, Qinghong Han,Kun Kuang, Jiwei Li, and Fei Wu. 2021. BertGCN:Transductive text classification by combining GCNand BERT. arXiv preprint arXiv:2105.05727.

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Man-dar Joshi, Danqi Chen, Omer Levy, Mike Lewis,Luke Zettlemoyer, and Veselin Stoyanov. 2019.RoBERTa: A robustly optimized BERT pretrainingapproach. arXiv preprint arXiv:1907.11692.

Amita Misra, Pranav Anand, Jean E. Fox Tree, andMarilyn Walker. 2015. Using summarization to dis-cover argument facets in online idealogical dialog.In Proceedings of the 2015 Conference of the NorthAmerican Chapter of the Association for Computa-tional Linguistics: Human Language Technologies,pages 430–440, Denver, Colorado. Association forComputational Linguistics.

John X. Morris, Eli Lifland, Jin Yong Yoo, JakeGrigsby, Di Jin, and Yanjun Qi. 2020. Textattack:A framework for adversarial attacks, data augmenta-tion, and adversarial training in NLP.

Adam Paszke, Sam Gross, Francisco Massa, AdamLerer, James Bradbury, Gregory Chanan, TrevorKilleen, Zeming Lin, Natalia Gimelshein, LucaAntiga, Alban Desmaison, Andreas Kopf, EdwardYang, Zachary DeVito, Martin Raison, Alykhan Te-jani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang,Junjie Bai, and Soumith Chintala. 2019. Py-torch: An imperative style, high-performance deeplearning library. In H. Wallach, H. Larochelle,A. Beygelzimer, F. d’ Alche-Buc, E. Fox, and R. Gar-nett, editors, Advances in Neural Information Pro-cessing Systems 32, pages 8024–8035. Curran Asso-ciates, Inc.

Andreas Peldszus and Manfred Stede. 2015. Joint pre-diction in MST-style discourse parsing for argumen-tation mining. In Proceedings of the 2015 Confer-ence on Empirical Methods in Natural Language

https://doi.org/10.18653/v1/W17-5104

https://doi.org/10.18653/v1/W17-5104

https://doi.org/10.18653/v1/W17-5104

https://doi.org/10.18653/v1/2020.emnlp-main.3



https://doi.org/10.18653/v1/P19-3022

https://doi.org/10.18653/v1/P19-3022

https://www.aclweb.org/anthology/C04-1051


https://doi.org/10.18653/v1/P19-1456

https://doi.org/10.18653/v1/P19-1456

https://doi.org/10.18653/v1/W16-2816

https://doi.org/10.18653/v1/W16-2816

https://doi.org/10.18653/v1/P19-1049

https://doi.org/10.18653/v1/P19-1049

https://doi.org/10.18653/v1/P19-1049

https://www.aclweb.org/anthology/I13-1191



https://doi.org/10.18653/v1/W17-5114

https://doi.org/10.18653/v1/W17-5114

https://doi.org/10.18653/v1/W17-5114



https://doi.org/10.3115/v1/N15-1046

https://doi.org/10.3115/v1/N15-1046

http://arxiv.org/abs/2005.05909



http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf



https://doi.org/10.18653/v1/D15-1110

https://doi.org/10.18653/v1/D15-1110

https://doi.org/10.18653/v1/D15-1110

1593

Processing, pages 938–948, Lisbon, Portugal. Asso-ciation for Computational Linguistics.

Minghui Qiu and Jing Jiang. 2013. A latent variablemodel for viewpoint discovery from threaded forumposts. In Proceedings of the 2013 Conference of theNorth American Chapter of the Association for Com-putational Linguistics: Human Language Technolo-gies, pages 1031–1040, Atlanta, Georgia. Associa-tion for Computational Linguistics.

Sarvesh Ranade, Rajeev Sangal, and Radhika Mamidi.2013. Stance classification in online debates byrecognizing users’ intentions. In Proceedings ofthe SIGDIAL 2013 Conference, pages 61–69, Metz,France. Association for Computational Linguistics.

Nils Reimers, Iryna Gurevych, Nils Reimers, IrynaGurevych, Nandan Thakur, Nils Reimers, JohannesDaxenberger, and Iryna Gurevych. 2019a. Sentence-BERT: Sentence embeddings using siamese BERT-networks. In Proceedings of the 2019 Conference onEmpirical Methods in Natural Language Processing.Association for Computational Linguistics.

Nils Reimers, Benjamin Schiller, Tilman Beck, Jo-hannes Daxenberger, Christian Stab, and IrynaGurevych. 2019b. Classification and clustering ofarguments with contextualized word embeddings.In Proceedings of the 57th Annual Meeting of the As-sociation for Computational Linguistics, pages 567–578, Florence, Italy. Association for ComputationalLinguistics.

Ramit Sawhney, Arnav Wadhwa, Shivam Agarwal,and Rajiv Ratn Shah. 2020. GPolS: A contextualgraph-based language model for analyzing parlia-mentary debates and political cohesion. In Proceed-ings of the 28th International Conference on Com-putational Linguistics, pages 4847–4859, Barcelona,Spain (Online). International Committee on Compu-tational Linguistics.

Benjamin Schiller, Johannes Daxenberger, and IrynaGurevych. 2020. Stance detection benchmark:How robust is your stance detection? CoRR,abs/2001.01565.

Michael Schlichtkrull, Thomas N Kipf, Peter Bloem,Rianne Van Den Berg, Ivan Titov, and Max Welling.2018. Modeling relational data with graph convolu-tional networks. In European semantic web confer-ence, pages 593–607. Springer.

Swapna Somasundaran and Janyce Wiebe. 2009. Rec-ognizing stances in online debates. In Proceed-ings of the Joint Conference of the 47th AnnualMeeting of the ACL and the 4th International JointConference on Natural Language Processing of theAFNLP, pages 226–234, Suntec, Singapore. Associ-ation for Computational Linguistics.

Dhanya Sridhar, James Foulds, Bert Huang, LiseGetoor, and Marilyn Walker. 2015. Joint models of

disagreement and stance in online debate. In Pro-ceedings of the 53rd Annual Meeting of the Associa-tion for Computational Linguistics and the 7th Inter-national Joint Conference on Natural Language Pro-cessing (Volume 1: Long Papers), pages 116–125,Beijing, China. Association for Computational Lin-guistics.

Christian Stab, Tristan Miller, Benjamin Schiller,Pranav Rai, and Iryna Gurevych. 2018. Cross-topic argument mining from heterogeneous sources.In Proceedings of the 2018 Conference on Em-pirical Methods in Natural Language Processing,pages 3664–3674, Brussels, Belgium. Associationfor Computational Linguistics.

Manfred Stede and Jodi Schneider. 2018. Argumen-tation Mining. Number 40 in Synthesis Lectureson Human Language Technologies. Morgan & Clay-pool.

Orith Toledo-Ronen, Roy Bar-Haim, and NoamSlonim. 2016. Expert stance graphs for computa-tional argumentation. In Proceedings of the ThirdWorkshop on Argument Mining (ArgMining2016),pages 119–123, Berlin, Germany. Association forComputational Linguistics.

Dietrich Trautmann. 2020. Aspect-based argumentmining. In Proceedings of the 7th Workshop on Ar-gument Mining, pages 41–52, Online. Associationfor Computational Linguistics.

David Vilares and Yulan He. 2017. Detecting perspec-tives in political debates. In Proceedings of the 2017Conference on Empirical Methods in Natural Lan-guage Processing, pages 1573–1582, Copenhagen,Denmark. Association for Computational Linguis-tics.

Henning Wachsmuth, Martin Potthast, Khalid Al-Khatib, Yamen Ajjour, Jana Puschmann, Jiani Qu,Jonas Dorsch, Viorel Morari, Janek Bevendorff, andBenno Stein. 2017a. Building an argument searchengine for the web. In Proceedings of the 4th Work-shop on Argument Mining, pages 49–59. Associa-tion for Computational Linguistics.

Henning Wachsmuth, Benno Stein, and Yamen Ajjour.2017b. “pageRank” for argument relevance. In Pro-ceedings of the 15th Conference of the EuropeanChapter of the Association for Computational Lin-guistics: Volume 1, Long Papers, pages 1117–1127.Association for Computational Linguistics.

Minjie Wang, Da Zheng, Zihao Ye, Quan Gan, MufeiLi, Xiang Song, Jinjing Zhou, Chao Ma, Ling-fan Yu, Yu Gai, Tianjun Xiao, Tong He, GeorgeKarypis, Jinyang Li, and Zheng Zhang. 2019. Deepgraph library: A graph-centric, highly-performantpackage for graph neural networks. arXiv preprintarXiv:1909.01315.

Adina Williams, Nikita Nangia, and Samuel Bowman.2018. A broad-coverage challenge corpus for sen-tence understanding through inference. In Proceed-ings of the 2018 Conference of the North American

https://www.aclweb.org/anthology/N13-1123



https://www.aclweb.org/anthology/W13-4008

https://www.aclweb.org/anthology/W13-4008

https://doi.org/10.18653/v1/P19-1054

https://doi.org/10.18653/v1/P19-1054

https://doi.org/10.18653/v1/2020.coling-main.426





https://www.aclweb.org/anthology/P09-1026

https://www.aclweb.org/anthology/P09-1026

https://doi.org/10.3115/v1/P15-1012

https://doi.org/10.3115/v1/P15-1012

https://doi.org/10.18653/v1/D18-1402

https://doi.org/10.18653/v1/D18-1402

https://doi.org/10.18653/v1/W16-2814

https://doi.org/10.18653/v1/W16-2814

https://www.aclweb.org/anthology/2020.argmining-1.5

https://www.aclweb.org/anthology/2020.argmining-1.5

https://doi.org/10.18653/v1/D17-1165

https://doi.org/10.18653/v1/D17-1165

http://aclweb.org/anthology/W17-5106

http://aclweb.org/anthology/W17-5106

http://aclweb.org/anthology/E17-1105

http://aclweb.org/anthology/N18-1101

http://aclweb.org/anthology/N18-1101

1594

Chapter of the Association for Computational Lin-guistics: Human Language Technologies, Volume1 (Long Papers), pages 1112–1122. Association forComputational Linguistics.

Liang Yao, Chengsheng Mao, and Yuan Luo. 2019.Graph convolutional networks for text classification.In Proceedings of the AAAI Conference on ArtificialIntelligence, volume 33, pages 7370–7377.

Bowen Zhang, Min Yang, Xutao Li, Yunming Ye, Xi-aofei Xu, and Kuai Dai. 2020. Enhancing cross-target stance detection with transferable semantic-emotion knowledge. In Proceedings of the 58th An-nual Meeting of the Association for ComputationalLinguistics, pages 3188–3197.

1595

A Relational Graph ConvolutionalNetworks

The input to an R-GCN is a weighted, typed multi-graph with some initial node representation. Thenetwork is made up of stacked relational graphconvolutional layers; each layer computes a newset of node representations based on each node’sneighborhood. In effect, each layer combines theedge-type-specific representation of all of a node’sneighbors with its own representation. The prop-agation equation is defined per Schlichtkrull et al.(2018):

h(l+1)u = σ

∑r∈ΣE

∑v∈N r

u

1

|N ru |W (l)

r h(l)v wu,v,r +W

(l)0 h(l)

u

where u and v are nodes in the graph, N r

u is theneighborhood for node u of edge types r, 1

|N ru |

isthe normalization term, Wr is the per-relationshiptransformation, wu,v,r is the edge weight betweennodes u and v of edge type r, and W0 is the self-loop weight.

B Claim Node Representations

For the claim node representations, we format theinput to the Roberta Large NLI model as:[CLS] claim [SEP] topic [SEP]

We use the output representations (1024 dimsper claim node) as the node representations for thegraph.

C Hyperparameter Tuning

To tune hyperparameters, we used Optuna1 andthe tree of parzen estimators optimizer. We tunedthe IBMCS dataset with 100 samples on a 1080Ti,training 10 epochs for each sample. For theArgMin dataset, we tuned for 3 samples on anNvidia Quadro RTX 6000, fixing all parametersfrom the best IBMCS dataset, except for the num-ber of layers. We selected each based on the lowestvalidation loss.

D Selected Models

For both datasets, we tune the R-GCN on the vali-dation set, ending up with the following parametersettings: number of 3 graph convolutional layersfor ArgMin and 2 for IBMCS; 128 hidden dimen-sions per layer; a learning rate or 0.00856 and de-cay (γ) of 0.797; dropout of 0.005; τ of 0.6; batchsize of 10; and 4 bases for edge relations. We

1https://optuna.org

Parameter Type Low High

Threshold (τ ) float 0.5 1.0Learning Rate float (log) 10−6 10−2

LR Decay (γ) float 0.6 1.0Hidden Layers int 1 3

Hidden Units int 50 200Number of Bases int 1 6

Dropout float 0 0.5

Table 6: The range of hyperparameters we sweep overwhen training the relational graph convolutional net-work.

trained each model for 10 epochs. The IBMCSmodel took roughly 20 minutes to train, and theArgMin model took roughly 3 and a half hours totrain. We ran each model 5 times to account forrandom variations, and selected the run with thelowest validation score.

The IBMCS model has roughly 248k parametersand the ArgMin model has roughly 330k tunableparameters.

The BertGCN baseline used the RoBERTaGCNconfiguration from Lin et al. (2021). Per the orig-inal paper, we first trained a RoBERTa model onthe task for 50 epochs using a batch size of 64and a learning rate of 0.00001, then trained theRoBERTaGCN model for 60 epochs using a batchsize of 8, a GCN learning rate of 0.001, and aRoBERTa learning rate of 0.00001.