A Multidimensional Lexicon for Interpersonal Stancetakingjeisenst/papers/pavalanathan... · A Multidimensional Lexicon for Interpersonal Stancetaking ... and power dynamics ... that

A Multidimensional Lexicon for Interpersonal Stancetaking

Umashanthi PavalanathanGeorgia Institute of Technology

Atlanta, [email protected]

Jim FitzpatrickUniversity of Pittsburgh

Pittsburgh, [email protected]

Scott F. KieslingUniversity of Pittsburgh

Pittsburgh, [email protected]

Jacob EisensteinGeorgia Institute of Technology

Atlanta, [email protected]

Abstract

The sociolinguistic construct of stancetak-ing describes the activities through whichdiscourse participants create and signal re-lationships to their interlocutors, to thetopic of discussion, and to the talk it-self. Stancetaking underlies a wide rangeof interactional phenomena, relating toformality, politeness, affect, and subjec-tivity. We present a computational ap-proach to stancetaking, in which we builda theoretically-motivated lexicon of stancemarkers, and then use multidimensionalanalysis to identify a set of underlyingstance dimensions. We validate thesedimensions intrinsically and extrinsically,showing that they are internally coherent,match pre-registered hypotheses, and cor-relate with social phenomena.

1 Introduction

What does it mean to be welcoming or standoffish,light-hearted or cynical? Such interactional stylesare performed primarily with language, yet little isknown about how linguistic resources are arrayedto create these social impressions. The sociolin-guistic concept of interpersonal stancetaking at-tempts to answer this question, by providing a con-ceptual framework that accounts for a range of in-terpersonal phenomena, subsuming formality, po-liteness, and subjectivity (Du Bois, 2007).1 This

1Stancetaking is distinct from the notion of stance whichcorresponds to a position in a debate (Walker et al., 2012).Similarly, Freeman et al. (2014) correlate phonetic featureswith the strength of such argumentative stances.

framework has been applied almost exclusivelythrough qualitative methods, using close readingsof individual texts or dialogs to uncover how lan-guage is used to position individuals with respectto their interlocutors and readers.

We attempt the first large-scale operationaliza-tion of stancetaking through computational meth-ods. Du Bois (2007) formalizes stancetaking asa multi-dimensional construct, reflecting the re-lationship of discourse participants to (a) the au-dience or interlocutor; (b) the topic of discourse;(c) the talk or text itself. However, the multi-dimensional nature of stancetaking poses prob-lems for traditional computational approaches, inwhich labeled data is obtained by relying on anno-tator intuitions about scalar concepts such polite-ness (Danescu-Niculescu-Mizil et al., 2013) andformality (Pavlick and Tetreault, 2016).

Instead, our approach is based on atheoretically-guided application of unsupervisedlearning, in the form of factor analysis, appliedto lexical features. Stancetaking is characterizedin large part by an array of linguistic featuresranging from discourse markers such as actuallyto backchannels such as yep (Kiesling, 2009).We therefore first compile a lexicon of stancemarkers, combining prior lexicons from Biber andFinegan (1989) and the Switchboard Dialogue ActCorpus (Jurafsky et al., 1998). We then extend thislexicon to the social media domain using wordembeddings. Finally, we apply multi-dimensionalanalysis of co-occurrence patterns to identify asmall set of stance dimensions.

To measure the internal coherence (constructvalidity) of the stance dimensions, we use a word

intrusion task (Chang et al., 2009) and a set of pre-registered hypotheses. To measure the utility ofthe stance dimensions, we perform a series of ex-trinsic evaluations. A predictive evaluation showsthat the membership of online communities is de-termined in part by the interactional stances thatpredominate in those communities. Furthermore,the induced stance dimensions are shown to alignwith annotations of politeness and formality.

Contributions We operationalize the sociolin-guistic concept of stancetaking as a multi-dimensional framework, making it possible tomeasure at scale. Specifically,

• we contribute a lexicon of stance markers basedon prior work and adapted to the genre of onlineinterpersonal discourse;

• we group stance markers into latent dimensions;

• we show that these stance dimensions are inter-nally coherent;

• we demonstrate that the stance dimensions pre-dict and correlate with social phenomena.2

2 Related Work

From a theoretical perspective, we build onprior work on interactional meaning in language.Methodologically, our paper relates to prior workon lexicon-based analysis and contrastive studiesof social media communities.

2.1 Linguistic Variation and Social MeaningIn computational sociolinguistics (Nguyen et al.,2016), language variation has been studied pri-marily in connection with macro-scale social vari-ables, such as age (Argamon et al., 2007; Nguyenet al., 2013), gender (Burger et al., 2011; Bam-man et al., 2014), race (Eisenstein et al., 2011;Blodgett et al., 2016), and geography (Eisensteinet al., 2010). This parallels what Eckert (2012)has called the “first wave” of language variationstudies in sociolinguistics, which also focused onmacro-scale variables.

More recently, sociolinguists have dedicated in-creased attention to situational and stylistic varia-tion, and the interactional meaning that such vari-ation can convey (Eckert and Rickford, 2001).This linguistic research can be aligned with com-putational efforts to quantify phenomena such

2Lexicons and stance dimensions are available athttps://github.com/umashanthi-research/multidimensional-stance-lexicon

as subjectivity (Riloff and Wiebe, 2003), senti-ment (Wiebe et al., 2005), politeness (Danescu-Niculescu-Mizil et al., 2013), formality (Pavlickand Tetreault, 2016), and power dynamics (Prab-hakaran et al., 2012). While linguistic researchon interactional meaning has focused largely onqualitative methodologies such as discourse anal-ysis (e.g., Bucholtz and Hall, 2005), these com-putational efforts have made use of crowdsourcedannotations to build large datasets of, for example,polite and impolite text. These annotation effortsdraw on the annotators’ intuitions about the mean-ing of these sociolinguistic constructs.

Interpersonal stancetaking represents an at-tempt to unify concepts such as sentiment, polite-ness, formality, and subjectivity under a single the-oretical framework (Jaffe, 2009; Kiesling, 2009).The key idea, as articulated by Du Bois (2007), isthat stancetaking captures the speaker’s relation-ship to (a) the topic of discussion, (b) the inter-locutor or audience, and (c) the talk (or writing)itself. Various configurations of these three legsof the “stance triangle” can account for a rangeof phenomena. For example, epistemic stance re-lates to the speaker’s certainty about what is be-ing expressed, while affective stance indicates thespeaker’s emotional position with respect to thecontent (Ochs, 1993).

The framework of stancetaking has been widelyadopted in linguistics, particularly in the discourseanalytic tradition, which involves close readingof individual texts or conversations (Karkkainen,2006; Keisanen, 2007; Precht, 2003; White, 2003).But despite its strong theoretical foundation, weare aware of no prior efforts to operationalizestancetaking at scale. Since annotators may nothave strong intuitions about stance — in the waythat they do about formality and politeness — wecannot rely on the annotation methodologies em-ployed in prior work. We take a different ap-proach, performing a multidimensional analysis ofthe distribution of likely stance markers.

2.2 Lexicon-based Analysis

Our operationalization of stancetaking is based onthe induction of lexicons of stance markers. Thelexicon-based methodology is related to earlierwork from social psychology, such as the Gen-eral Inquirer (Stone, 1966) and LIWC (Tausczikand Pennebaker, 2010). In LIWC, the basic cate-gories were identified first, based on psychological

https://github.com/umashanthi-research/multidimensional-stance-lexicon

https://github.com/umashanthi-research/multidimensional-stance-lexicon

constructs (e.g., positive emotion, cognitive pro-cesses, drive to power) and syntactic groupings ofwords and phrases (e.g., pronouns, prepositions,quantifiers). The lexicon designers then manuallycontructed lexicons for each category, augmentingtheir intuitions by using distributional statistics tosuggest words that may have been missed (Pen-nebaker et al., 2015). In contrast, we follow theapproach of Biber (1991), using multidimensionalanalysis to identify latent groupings of markersbased on co-occurrence statistics. We then usecrowdsourcing and extrinsic comparisons to val-idate the coherence of these dimensions.

2.3 Multicommunity Studies

Social media platforms such as Reddit, Stack Ex-change, and Wikia can be considered multicom-munity environments, in that they host multiplesubcommunities with distinct social and linguis-tic properties. Such subcommunities can be con-trasted in terms of topics (Adamic et al., 2008;Hessel et al., 2014) and social networks (Back-strom et al., 2006). Our work focuses on Red-dit, emphasizing community-wide differences innorms for interpersonal interaction. In the samevein, Tan and Lee (2015) attempt to characterizestylistic differences across subreddits by focusingon very common words and parts-of-speech; Tranand Ostendorf (2016) use language models andtopic models to measure similarity across threadswithin a subreddit. One distinction of our ap-proach is that the use of multidimensional analy-sis gives us interpretable dimensions of variation.This makes it possible to identify the specific in-terpersonal features that vary across communities.

3 Data

Reddit, one of the internet’s largest social me-dia platforms, is a collection of subreddits or-ganized around various topics of interest. Asof January 2017, there were more than one mil-lion subreddits and nearly 250 million users, dis-cussing topics ranging from politics (r/politics)to horror stories (r/nosleep).3 Although Redditwas originally designed for sharing hyperlinks, italso provides the ability to post original textualcontent, submit comments, and vote on contentquality (Gilbert, 2013). Reddit’s conversation-likethreads are therefore well suited for the study ofinterpersonal social and linguistic phenomena.

3http://redditmetrics.com/

Subreddits 126,789Authors 6,401,699Threads 52,888,024Comments 531,804,658

Table 1: Dataset size

For example, the following are two commentsfrom the subreddit r/malefashionadvice, posted inresponse to a picture posted by a user asking forfashion advise.

U1: “I think the beard looks pretty good. Defi-nitely not the goatee. Clean shaven is alwaysthe safe option.”

U2: “Definitely the beard. But keep it trimmed.”

The phrases in bold face are markers of stance,indicating a evaluative stance. The followingexample is a part of a thread in the subredditr/photoshopbattles where users discuss an editedimage posted by the original poster OP. Thephrases in bold face are markers of stance,indicating an involved and interactional stance.

U3: “Ha ha awesome!”U4: ‘‘are those..... furries?”

OP: “yes, sir. They are!”U4: “Oh cool. That makes sense!”

We used an archive of 530 million commentsposted on Reddit in 2014, retrieved from the pub-lic archive of Reddit comments.4 This datasetconsists of each post’s textual content, along withmetadata that identifies the subreddit, thread, au-thor, and post creation time. More statistics aboutthe full dataset are shown in Table 1.

4 Stance Lexicon

Interpersonal stancetaking can be characterized inpart by an array of linguistic features such ashedges (e.g., might, kind of ), discourse markers(e.g., actually, I mean), and backchannels (e.g.,yep, um). Our analysis focuses on these markers,which we collect into a lexicon.

4.1 Seed lexiconWe began with a seed lexicon of stance markersfrom Biber and Finegan (1989), who compiled an

4https://archive.org/details/2015_reddit_comments_corpus

http://redditmetrics.com/

https://archive.org/details/2015_reddit_comments_corpus

https://archive.org/details/2015_reddit_comments_corpus

extensive list by surveying dictionaries, previousstudies on stance, and texts in several genres ofEnglish. This list includes certainty adverbs (e.g.,actually, of course, in fact), affect markers (e.g.,amazing, thankful, sadly), and hedges (e.g., kindof, maybe, something like) among other adverbial,adjectival, verbal, and modal markers of stance. Intotal, this list consists of 448 stance markers.

The Biber and Finegan (1989) lexicon is pri-marily based on written genres from the pre-socialmedia era. Our dataset — like much of the re-cent work in this domain — consists of online dis-cussions, which differ significantly from printedtexts (Eisenstein, 2013). One difference is thatonline discussions contain a number of dialog actmarkers that are characteristic of spoken language,such as oh yeah, nah, wow. We accounted forthis by adding 74 dialog act markers from theSwitchboard Dialog Act Corpus (Jurafsky et al.,1998). The final seed lexicon consists of 517unique markers, from these two sources. Note thatthe seed lexicon also includes markers that containmultiple tokens (e.g. kind of, I know).

4.2 Lexicon expansionOnline discussions differ not only from writ-ten texts, but also from spoken discussions,due to their use of non-standard vocabulary andspellings. To measure stance accurately, thesegenre differences must be accounted for. Wetherefore expanded the seed lexicon using auto-mated techniques based on distributional statistics.This is similar to prior work on the expansion ofsentiment lexicons (Hatzivassiloglou and McKe-own, 1997; Hamilton et al., 2016).

Our lexicon expansion approach used word em-beddings to find words that are distributionallysimilar to those in the seed set. We trained wordembeddings on a corpus of 25 million Reddit com-ments and a vocabulary of 100K most frequentwords on Reddit using the structured skip-grammodels of both WORD2VEC (Mikolov et al., 2013)and WANG2VEC (Ling et al., 2015) with defaultparameters. The WANG2VEC method augmentsWORD2VEC by accounting for word order infor-mation. We found the similarity judgments ob-tained from WANG2VEC to be qualitatively moremeaningful, so we used these embeddings to con-struct the expanded lexicon.5

5We used the following default parameters: 100 dimen-sions, a window size of five, a negative sampling size of ten,five-epoch iterations, and a sub-sampling rate of 10−4.

Seed term Expanded terms

(Example seeds from Biber and Finegan (1989))

significantly considerably, substantially, dramaticallycertainly surely, frankly, definitelyincredibly extremely, unbelievably, exceptionally

(Example seeds from Jurafsky et al. (1998))

nope nah, yup, nevermindgreat fantastic, terrific, excellent

Table 2: Stance lexicon: seed and expanded terms.

To perform lexicon expansion, we constructeda dictionary of candidate terms, consisting of allunigrams that occur with a frequency rate of atleast 10−7 in the Reddit comment corpus. Then,for each single-token marker in the seed lexi-con, we identified all terms from the candidateset whose embedding has cosine similarity of atleast 0.75 with respect to the seed marker.6 Ta-ble 2 shows examples of seed markers and re-lated terms we extracted from word embeddings.Through this procedure, we identified 228 addi-tional markers based on similarity to items in theseed list from Biber and Finegan (1989), and 112additional markers based on the seed list of dia-log acts. In total, our stance lexicon contains 812unique markers.

5 Linguistic Dimensions of Stancetaking

To summarize the main axes of variation acrossthe lexicon of stance markers, we apply a multi-dimensional analysis (Biber, 1992) to the distribu-tional statistics of stance markers across subred-dit communities. Each dimension of variation canthen be viewed as a spectrum, characterized by thestance markers and subreddits that are associatedwith the positive and negative extremes. Multi-dimensional analysis is based on singular valuedecomposition, which has been applied success-fully to a wide range of problems in natural lan-guage processing and information retrieval (e.g.,Landauer et al., 1998). While Bayesian topic mod-els are an appealing alternative, singular value de-composition is fast and deterministic, with a min-imal number of tuning parameters.

6We tried different thresholds on the similarity value andthe corpus frequency, and the reported values were chosenbased on the quality of the resulting related terms. This wasdone prior to any of the validations or extrinsic analyses de-scribed later in the paper.

5.1 Extracting Stance Dimensions

Our analysis is based on the co-occurrence ofstance markers and subreddits. This is motivatedby our interest in comparisons of the interactionalstyles of online communities within Reddit, andby the premise that these distributional differencesreflect socially meaningful communicative norms.A pilot study applied the same technique to the co-occurrence of stance markers and individual au-thors, and the resulting dimensions appeared to beless stylistically coherent.

Singular value decomposition is often used incombination with a transformation of the co-occurrence counts by pointwise mutual informa-tion (Bullinaria and Levy, 2007). This transforma-tion ensures that each cell in the matrix indicateshow much more likely a stance marker is to co-occur with a given subreddit than would happenby chance under an independence assumption. Be-cause negative PMI values tend to be unreliable,we use positive PMI (PPMI), which involves re-placing all negative PMI values with zeros (Niwaand Nitta, 1994). Therefore, we obtain stance di-mensions by applying singular value decomposi-tion to the matrix constructed as follows:

Xm,s =

(log

Pr(marker = m, subreddit = s)

Pr(marker = m) Pr(subreddit = s)

)+

.

Truncated singular value decomposition per-forms the approximate factorization X ≈ UΣV >,where each row of the matrix U is a k-dimensionaldescription of each stance marker, and each row ofV is a k-dimensional description of each subred-dit. We included the 7,589 subreddits that receivedat least 1,000 comments in 2014.

5.2 Results: Stance DimensionsFrom the SVD analysis, we extracted the six prin-cipal latent dimensions that explain the most vari-ation in our dataset.7 The decision to include onlythe first six dimensions was based on the strengthof the singular values corresponding to the dimen-sions. Table 3 shows the top five stance markersfor each extreme of the six dimensions. The stancedimensions convey a range of concepts, such asinvolved versus informational language, narrative

7Similar to factor analysis, the top few dimensions ofSVD explain the most variation, and tend to be most inter-pretable. A scree plot (Cattell, 1966) showed that the amountof variation explained dropped after the top six dimensions,and qualitative interpretation showed that the remaining di-mension were less interpretable.

0.04 0.03 0.02 0.01 0.00 0.01 0.02 0.03

0.02

0.01

0.00

0.01

0.02

gadgets

nsfw

politics

funny

spacemalefashionadvice

food

worldnews

explainlikeimfivetattoos

facepalm

photoshopbattles

aww

asksc ience

trees

gonewild

sc ience

programming

personalfinance

atheism4chan

his tory

(+)Dim-2(-)

Dim

-3(+

)(-

)

Figure 1: Mapping of subreddits in dimensiontwo and dimension three, highlighting especiallypopular subreddits. Picture-oriented subredditsr/gonewild and r/aww map high on dimension twoand low on dimension three, indicating involvedand informal style of discourse. Subreddits ded-icated for knowledge sharing discussions such asr/askscience and r/space map low on dimensiontwo and high on dimension three indicating infor-mational and formal style.

versus dialogue-oriented writing, standard versusnon-standard variation, and positive versus nega-tive affect. Figure 1 shows the distribution of sub-reddits along two of these dimensions.

6 Construct Validity

Evaluating model output against gold-standard an-notations is appropriate when there is some no-tion of a correct answer. As stancetaking is amultidimensional concept, we have taken an unsu-pervised approach. Therefore, we use evaluationtechniques based on the notion of validity, whichis the extent to which the operationalization of aconstruct truly captures the intended quantity orconcept. Validation techniques for unsupervisedcontent analysis are widely found in the social sci-ence literature (Weber, 1990; Quinn et al., 2010)and have also been recently used in the NLP andmachine learning communities (e.g., Chang et al.,2009; Murphy et al., 2012; Sim et al., 2013).

We used several methods to validate the stancedimensions extracted from the corpus of Redditcomments. This section describes intrinsic eval-uations, which test whether the extracted stancedimensions are linguistically coherent and mean-

Stance markers Subreddits

Dim-1 - beautifully, pleased, thanks, spectacular, delightful philosophy, history, science+ just, even, all, no, so pcmasterrace, leagueoflegends, gaming

Dim-2 - suggests that, demonstrates, conclude, demonstrated, demonstrate philosophy, science, askscience,+ lovely, awww, hehe, aww, haha gonewild, nsfw, aww

Dim-3 - funnier, hilarious, disturbing, creepy, funny cringe, creepy, cringepics+ thanks, ideally, calculate, estimate, calculation askscience, personalfinance, space

Dim-4 - phenomenal, bummed, enjoyed, fantastic, disappointing movies, television, books+ hello, thx, hehe, aww, hi philosophy, 4chan, atheism

Dim-5 - lovely, stunning, wonderful, delightful, beautifully gonewild, aww, tattoos+ nvm, cmon, smh, lmao, disappointing nfl, soccer, cringe

Dim-6 - stunning, fantastic, incredible, amazing, spectacular philosophy, gonewild, askscience+ anxious, stressed, exhausted, overwhelmed, relieved relationships, sex, nosleep

Table 3: For each of the six dimensions extracted by our method, we show the five markers and threesubreddits (among the 100 most popular subreddits) with the highest loadings.

ingful, thereby testing the construct or content va-lidity of the proposed stance dimensions (Quinnet al., 2010). Extrinsic evaluations are presentedin section 7.

6.1 Word Intrusion Task

A word intrusion task is used to measure the co-herence and interpretability of a group of words.Human raters are presented with a list of terms, allbut one of which are selected from a target con-cept; their task is to identify the intruder. If thetarget concept is internally coherent, human ratersshould be able to perform this task accurately; ifnot, their selections should be random. Word in-trusion tasks have previously been used to validatethe interpretability of topic models (Chang et al.,2009) and vector space models (Murphy et al.,2012).

We deployed a word intrusion task on AmazonMechanical Turk (AMT), in which we presentedthe top four stance markers from one end of a di-mension, along with an intruder marker selectedfrom the top four markers of the opposite end ofthat dimension. In this way, we created four wordintrusion tasks for each end of each dimension.The main reason for including only the top fourwords in each dimension is the expense of con-ducting crowd-sourced evaluations. In the mostrelevant prior work, Chang et al. (2009) used thetop five words from each topic in their evaluationof topic models.

Worker selection We required that the AMTworkers (“turkers”) have completed a minimum of1,000 HITs and have at least 95% approval rate

Furthermore, because our task is based on analysisof English language texts, we required the turkersto be native speakers of English living in one ofthe majority English speaking countries. As a fur-ther requirement, we required the turkers to obtaina qualification which involves an English compre-hension test similar to the questions in standard-ized English language tests. These requirementsare based on best practices identified by Callison-Burch and Dredze (2010).

Task specification Each AMT human intelli-gence task (HIT) consists of twelve word intrusiontasks, one for each end of the six dimensions. Weprovided minimal instructions regarding the task,and did not provide any examples, to avoid intro-ducing bias.8 As a further quality control, eachHIT included three questions which ask the turkersto pick the best synonym for a given word from alist of five answers, where one answer was clearlycorrect; Turkers who gave incorrect answers wereto be excluded, but this situation did not arise inpractice. Altogether each HIT consists of 15 ques-tions, and was paid US$1.50. Five different turk-ers performed each HIT.

Results We measured the interrater reliabilityusing Krippendorf’s α (Krippendorff, 2007) andthe model precision metric of Chang et al. (2009).Results on both metrics were encouraging. Weobtained a value of α = 0.73, on a scale where

8The prompt for the word intrusions task was: “Select theintruder word/phrase: you will be given a list of five Englishwords/phrases and asked to pick the word/phrase that is leastsimilar to the other four words/phrases when used in onlinediscussion forums.”

α = 0 indicates chance agreement and α = 1 indi-cates perfect agreement. The model precision was0.82; chance precision is 0.20. To offer a sense oftypical values for this metric, Chang et al. (2009)report model precisions in the range 0.7–0.83 intheir analysis of topic models. Overall, these re-sults indicate that the multi-dimensional analysishas succeeded at identifying dimensions that re-flect natural groupings of stance markers.

6.2 Pre-registered Hypotheses

Content validity was also assessed using a set ofpre-registered hypotheses. The practice of pre-registering hypotheses before an analysis and test-ing the correctness is widely used in the socialsciences; it was adopted by Sim et al. (2013)to evaluate the induction of political ideologicalmodels from text. Before performing the muti-dimensional analysis, we identified two groups ofhypotheses that are expected to hold with respectto the latent stancetaking dimensions using ourprior linguistic knowledge:

• Hypothesis I: Stance markers that are syn-onyms should not appear on the oppositeends of a stance dimension.• Hypothesis II: If at least one stance marker

from a predefined stance feature group (de-fined below) appears on one end of a stancedimension, then other markers from the samefeature group will tend not to appear at theopposite end of the same dimension.

6.2.1 Synonym PairsFor each marker in our stance lexicon, we ex-tracted synonyms from Wordnet, focusing onmarkers that appear in only one Wordnet synset,and not including pairs in which one term wasan inflection of the other.9 Our final list contains73 synonym pairs (e.g., eventually/finally, grate-ful/thankful, yea/yeah). Of these pairs, there were59 cases in which both terms appeared in either thetop or bottom 200 positions of a stance dimension.In 51 of these cases (86%), the two terms appearedon the same side of the dimension. The chance ratewould be 50%, so this supports Hypothesis I and

9It is possible that inflections are semantically similar, be-cause by definition they are changes in the form of a word tomark distinctions such as tense, person, or number. However,different inflections of a single word form might be used tomark different stances (e.g., some stances might be associ-ated with the past while others might be associated with thepresent or future).

Number of synonym pairsStance Dimension On same end On opposite ends

DIMENSION 1 6 3DIMENSION 2 12 2DIMENSION 3 2 1DIMENSION 4 11 0DIMENSION 5 10 2DIMENSION 6 10 0

Total 51/59 8/59

Table 4: Results for pre-registered hypothesis thatstance dimensions will not split synonym pairs.

further validates the stance dimensions. More de-tails of the results are shown in Table 4. Note thatsynonym pairs may differ in aspects such as for-mality (e.g., said/informed, want/desire), whichis one of the main dimensions of stancetaking.Therefore, perfect support for Hypothesis I is notexpected.

6.2.2 Stance Feature GroupsBiber and Finegan (1989) group stance markersinto twelve “feature groups”, such as certaintyadverbs, doubt adverbs, affect expressions, andhedges. Ideally, the stance dimensions should pre-serve these groupings. To test this, for each ofthe seven feature groups with at least ten stancemarkers in the lexicon, we counted the numberof terms appearing among the top 200 positionsin both ends (high/low) of each dimension. Un-der the null hypothesis, the stance dimensions arerandom with respect to the feature groups, so wewould expect roughly an equal number of mark-ers on both ends. As shown in Table 5, for five ofthe seven feature groups, it is possible to reject thenull hypothesis at p < .007, which is the signifi-cance threshold at α = 0.05, after correcting formultiple comparisons using the Bonferroni correc-tion. This indicates that the stance dimensions arealigned with predefined stance feature groups.

7 Extrinsic Evaluations

The evaluations in the previous section test inter-nal validity; we now describe evaluations testingwhether the stance dimensions are relevant to ex-ternal social and interactional phenomena.

7.1 Predicting Cross-postingOnline communities can be considered as commu-nities of practice (Eckert and McConnell-Ginet,1992), where members come together to engagein shared linguistic practices. These practices

Feature #Stance χ2 p-value Rejectgroup marker null?

Certainty adv. 38 16.94 4.6e−03 XDoubt adv. 23 13.21 2.2e−02 ×Certainty verbs 36 48.99 2.2e−09 XDoubt verbs 55 30.45 1.2e−05 XCertainty adj. 28 29.73 1.7e−05 XDoubt adj. 12 14.80 1.1e−02 ×Affect exp. 227 97.17 2.1e−19 X

Table 5: Results for preregistered hypothesis thatstance dimensions will align with stance featuregroups of Biber and Finegan (1989).

evolve simultaneously with membership, coalesc-ing into shared norms. The memberships of mul-tiple subreddits on the same topic (e.g., r/scienceand r/askscience) often do not overlap consider-ably. Therefore we hypothesize that users of Red-dit have preferred interactional styles, and that par-ticipation in subreddit communities is governednot only by topic interest, but also by these inter-actional preferences. The proposed stancetakingdimensions provide a simple measure of interac-tional style, allowing us to test whether it is pre-dictive of community membership decisions.

Classification task We design a classificationtask, in which the goal is to determine whethera pair of subreddits is high-crossover or low-crossover. In high-crossover subreddit pairs, indi-viduals are especially likely to participate in both.For the purpose of this evaluation, individuals areconsidered to participate in a subreddit if they con-tribute posts or comments. We compute the point-wise mutual information (PMI) with respect tocross-participation among the 100 most popularsubreddits. For each subreddit s, we identify thefive highest and lowest PMI pairs 〈s, t〉, and addthese to the high-crossover and low-crossover sets,respectively. Example pairs are shown in Table 6.After eliminating redundant pairs, we identify 437unique high-crossover pairs, and 465 unique low-crossover pairs. All evaluations are based on mul-tiple random training/test splits over this dataset.

Classification approaches A simple classifica-tion approach is to predict that subreddits withsimilar text will have high crossover. We mea-sure similarity using TF-IDF weighted cosine sim-ilarity, using two possible lexicons: the 8,000most frequent words on reddit (BOW), and thestance lexicon (STANCE MARKERS). The simi-larity threshold between high-crossover and low-

Cross-Community Participation

High-Scoring Pairs Low-Scoring Pairs

r/blog, r/announcements r/gonewild, r/leagueoflegendsr/pokemon, r/wheredidthesodago r/soccer, r/nosleepr/politics, r/technology r/programming, r/gonewildr/LifeProTips, r/dataisbeautiful r/nfl, r/leagueoflegendsr/Unexpected, r/JusticePorn r/Minecraft, r/personalfinance

Table 6: Examples of subreddit pairs that havelarge and small amount of overlap of contributingmembers.

Cosine SVD

BOW 66.13% 77.48%STANCE MARKERS 64.31% 84.93%

Table 7: Accuracy for prediction of subredditcross-participation.

crossover pairs was estimated on the training data.We also tested the relevance of multi-dimensionalanalysis, by applying SVD to both lexicons. Foreach pair of subreddits, we computed a feature setof the absolute difference across the top six latentdimensions, and applied a logistic regression clas-sifier. Regularization was tuned by internal cross-validation.

Results Table 7 shows average accuracies forthese models. The stance-based SVD featuresare considerably more accurate than the BOW-based SVD features, indicating that interactionalstyle does indeed predict cross-posting behavior.10

Both are considerably more accurate than the bag-of-words models based on cosine similarity.

7.2 Politeness and FormalityThe utility of the induced stance dimensions de-pends on their correlation with social phenomenaof interest. Prior work has used crowdsourcingto annotate texts for politeness and formality. Wenow evaluate the stancetaking properties of theseannotated texts.

Data We used the politeness corpus ofWikipedia edit requests from Danescu-Niculescu-Mizil et al. (2013), which includes the textualcontent of the edit requests, along with scalarannotations of politeness. Following the original

10We use BOW+SVD as the most comparable content-based alternative to our stancetaking dimensions. While theremay be more accurate discriminative approaches, our goal isa direct comparison of stance and content-based features, notan exhaustive comparison of classification approaches.

authors, we compare the text for the messagesranked in the first and fourth quartiles of polite-ness scores. For formality, we used the corpusfrom Pavlick and Tetreault (2016), focusing onthe blogs domain, which is most similar to ourdomain of Reddit. Each sentence in this corpuswas annotated for formality levels from −3 to+3. We considered only the sentences with meanformality score greater than +1 (more formal)and less than −1 (less formal).

Stance dimensions For each document in theabove datasets, we compute the stance properties,as follows: for each dimension, we compute thetotal frequency of the hundred most positive termsand the hundred most negative terms, and thentake the difference. Instances containing no termsfrom either list are excluded. We focus on stancedimensions two and five (summarized in Table 3),because they appeared to be most relevant to po-liteness and formality. Dimension two contrastsinformational and argumentative language againstemotional and non-standard language. Dimensionfive contrasts positive and formal language againstnon-standard and somewhat negative language.

Results A kernel density plot of the resultingdifferences is shown in Figure 2. The effect sizesof the resulting differences are quantified usingCohen’s d statistic (Cohen, 1988). Effect sizes forall differences are between 0.3 and 0.4, indicatingsmall-to-medium effects — except for the evalu-ation of formality on dimension five, where theeffect size is close to zero. The relatively mod-est effect sizes are unsurprising, given the shortlength of the texts. However, these differenceslend insight to the relationship between formal-ity and politeness, which may seem to be closelyrelated concepts. On dimension two, it is possi-ble to be polite while using non-standard languagesuch as hehe and awww, so long as the sentimentexpressed is positive; however, these markers arenot consistent with formality. On dimension five,we see that positive sentiment terms such as lovelyand stunning are consistent with politeness, butnot with formality. Indeed, the distribution of di-mension five indicates that both ends of dimensionfive are consistent only with informal texts.

Overall, these results indicate that interactionalphenomena such as politeness and formality arereflected in our stance dimensions, which are in-duced in an unsupervised manner. Future work

02468

10not politepolite

not politepolite

0.4 0.2 0.0 0.2 0.4dimension 2:

suggests that, demonstrates vs.lovely, awww, hehe

02468

10not formalformal

0.4 0.2 0.0 0.2 0.4dimension 5:

lovely, stunning, wonderful vs. nvm, cmon, smh

not formalformal

Figure 2: Kernel density distributions for stancedimensions 2 and 5, plotted with respect to anno-tations of politeness and formality.

may consider the utility of these stance dimen-sions to predict these social phenomena, particu-larly in cross-domain settings where lexical clas-sifiers may overfit.

8 Conclusion

Stancetaking provides a general perspective on thevarious linguistic phenomena that structure socialinteractions. We have identified a set of severalhundred stance markers, building on previously-identified lexicons by using word embeddings toperform lexicon expansion. We then used multi-dimensional analysis to group these markers intostance dimensions, which we show to be internallycoherent and extrinsically useful. Our hope is thatthese stance dimensions will be valuable as a con-venient building block for future research on inter-actional meaning.

Acknowledgments Thanks to the anonymousreviewers for their useful and constructive feed-back on our submission. This research was sup-ported by Air Force Office of Scientific Researchaward FA9550-14-1-0379, by National Institutesof Health award R01-GM112697, and by the Na-tional Science Foundation awards 1452443 and1111142. We thank Tyler Schnoebelen for help-ful discussions; C.J. Hutto, Tanushree Mitra, andSandeep Soni for assistance with Mechanical Turkexperiments; and Ian Stewart for assistance withcreating word embeddings. We also thank the Me-chanical Turk workers for performing the word in-trusion task, and for feedback on a pilot task.

ReferencesLada A. Adamic, Jun Zhang, Eytan Bakshy, and

Mark S. Ackerman. 2008. Knowledge sharing andyahoo answers: Everyone knows something. In

Proceedings of the Conference on World-Wide Web(WWW). pages 665–674.

Shlomo Argamon, Moshe Koppel, James W. Pen-nebaker, and Jonathan Schler. 2007. Mining theblogosphere: Age, gender and the varieties of self-expression. First Monday 12(9).

Lars Backstrom, Dan Huttenlocher, Jon Kleinberg, andXiangyang Lan. 2006. Group formation in large so-cial networks: Membership, growth, and evolution.In Proceedings of Knowledge Discovery and DataMining (KDD). pages 44–54.

David Bamman, Jacob Eisenstein, and Tyler Schnoe-belen. 2014. Gender identity and lexical variation insocial media. Journal of Sociolinguistics 18(2):135–160.

Douglas Biber. 1991. Variation across speech andwriting. Cambridge University Press.

Douglas Biber. 1992. The multi-dimensional ap-proach to linguistic analyses of genre variation: Anoverview of methodology and findings. Computersand the Humanities 26(5-6):331–345.

Douglas Biber and Edward Finegan. 1989. Styles ofstance in english: Lexical and grammatical markingof evidentiality and affect. Text 9(1):93–124.

Su Lin Blodgett, Lisa Green, and Brendan OConnor.2016. Demographic dialectal variation in social me-dia: A case study of african-american english. InProceedings of Empirical Methods for Natural Lan-guage Processing (EMNLP). pages 1119–1130.

M. Bucholtz and K. Hall. 2005. Identity and interac-tion: A sociocultural linguistic approach. Discoursestudies 7(4-5):585–614.

John A Bullinaria and Joseph P Levy. 2007. Extractingsemantic representations from word co-occurrencestatistics: A computational study. Behavior re-search methods 39(3):510–526.

John D. Burger, John Henderson, George Kim, andGuido Zarrella. 2011. Discriminating gender ontwitter. In Proceedings of Empirical Methodsfor Natural Language Processing (EMNLP). pages1301–1309.

Chris Callison-Burch and Mark Dredze. 2010. Cre-ating speech and language data with amazon’s me-chanical turk. In Proceedings of the NAACL HLT2010 Workshop on Creating Speech and LanguageData with Amazon’s Mechanical Turk. Associationfor Computational Linguistics, pages 1–12.

Raymond B Cattell. 1966. The scree test for the num-ber of factors. Multivariate behavioral research1(2):245–276.

Jonathan Chang, Sean Gerrish, Chong Wang, Jordan LBoyd-graber, and David M Blei. 2009. Reading tealeaves: How humans interpret topic models. In Neu-ral Information Processing Systems (NIPS). Vancou-ver, pages 288–296.

Jacob Cohen. 1988. Statistical power analysis forthe behavioral sciences. Lawrence Earlbaum Asso-ciates, Hillsdale, NJ.

Cristian Danescu-Niculescu-Mizil, Moritz Sudhof,Dan Jurafsky, Jure Leskovec, and Christopher Potts.2013. A computational approach to politeness withapplication to social factors. In Proceedings of theAssociation for Computational Linguistics (ACL).Sophia, Bulgaria, pages 250–259.

John W. Du Bois. 2007. The stance triangle. InRobert Engelbretson, editor, Stancetaking in dis-course, John Benjamins Publishing Company, Ams-terdam/Philadelphia, pages 139–182.

Penelope Eckert. 2012. Three waves of variation study:the emergence of meaning in the study of sociolin-guistic variation. Annual Review of Anthropology41:87–100.

Penelope Eckert and Sally McConnell-Ginet. 1992.Think practically and look locally: Language andgender as community-based practice. Annual reviewof anthropology 21:461–490.

Penelope Eckert and John R Rickford. 2001. Styleand sociolinguistic variation. Cambridge Univer-sity Press.

Jacob Eisenstein. 2013. What to do about bad languageon the internet. In Proceedings of the North Amer-ican Chapter of the Association for ComputationalLinguistics (NAACL). pages 359–369.

Jacob Eisenstein, Amr Ahmed, and Eric P. Xing. 2011.Sparse additive generative models of text. In Pro-ceedings of the International Conference on Ma-chine Learning (ICML). pages 1041–1048.

Jacob Eisenstein, Brendan O’Connor, Noah A. Smith,and Eric P. Xing. 2010. A latent variable model forgeographic lexical variation. In Proceedings of Em-pirical Methods for Natural Language Processing(EMNLP). pages 1277–1287.

Valerie Freeman, Richard Wright, Gina-Anne Levow,Yi Luan, Julian Chan, Trang Tran, Victoria Zayats,Maria Antoniak, and Mari Ostendorf. 2014. Pho-netic correlates of stance-taking. The Journal of theAcoustical Society of America 136(4):2175–2175.

Eric Gilbert. 2013. Widespread underprovision on red-dit. In Proceedings of Computer-Supported Coop-erative Work (CSCW). pages 803–808.

William L. Hamilton, Kevin Clark, Jure Leskovec, andDan Jurafsky. 2016. Inducing domain-specific senti-ment lexicons from unlabeled corpora. In Proceed-ings of Empirical Methods for Natural LanguageProcessing (EMNLP). pages 595–605.

Vasileios Hatzivassiloglou and Kathleen R. McKeown.1997. Predicting the semantic orientation of adjec-tives. In Proceedings of the Association for Com-putational Linguistics (ACL). Madrid, Spain, pages174–181.

Jack Hessel, Chenhao Tan, and Lillian Lee. 2014. Sci-ence, askscience, and badscience: On the coexis-tence of highly related communities. In Proceedingsof the International Conference on Web and SocialMedia (ICWSM). AAAI Publications, Menlo Park,California, pages 171–180.

Alexandra Jaffe. 2009. Stance: Sociolinguistic Per-spectives. Oxford University Press.

Daniel Jurafsky, Elizabeth Shriberg, Barbara Fox, andTraci Curl. 1998. Lexical, prosodic, and syn-tactic cues for dialog acts. In Proceedings ofACL/COLING-98 Workshop on Discourse Relationsand Discourse Markers. pages 114–120.

Elise Karkkainen. 2006. Stance taking in conversa-tion: From subjectivity to intersubjectivity. Text &Talk-An Interdisciplinary Journal of Language, Dis-course Communication Studies 26(6):699–731.

Tiina Keisanen. 2007. Stancetaking as an interactionalactivity: Challenging the prior speaker. Stancetak-ing in discourse: Subjectivity, evaluation, interac-tion pages 253–81.

Scott Fabius Kiesling. 2009. Style as stance. Stance:sociolinguistic perspectives pages 171–194.

Klaus Krippendorff. 2007. Computing krippendorff’salpha reliability. Departmental papers (ASC)page 43.

Thomas Landauer, Peter W. Foltz, and Darrel Laham.1998. Introduction to latent semantic analysis. Dis-cource Processes 25:259–284.

Wang Ling, Chris Dyer, Alan W Black, and IsabelTrancoso. 2015. Two/too simple adaptations ofword2vec for syntax problems. In Proceedings ofthe North American Chapter of the Association forComputational Linguistics (NAACL). Denver, CO,pages 1299–1304.

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Cor-rado, and Jeff Dean. 2013. Distributed representa-tions of words and phrases and their compositional-ity. In Advances in Neural Information ProcessingSystems. pages 3111–3119.

Brian Murphy, Partha Pratim Talukdar, and TomMitchell. 2012. Learning effective and interpretablesemantic models using non-negative sparse embed-ding. In Proceedings of the International Con-ference on Computational Linguistics (COLING).Mumbai, India, pages 1933–1949.

Dong Nguyen, A Seza Dogruoz, Carolyn P Rose,and Franciska de Jong. 2016. Computational soci-olinguistics: A survey. Computational Linguistics42(3):537–593.

Dong Nguyen, Rilana Gravel, Dolf Trieschnigg, andTheo Meder. 2013. ”How Old Do You Think IAm?” A Study of Language and Age in Twitter. InProceedings of the International Conference on Weband Social Media (ICWSM). pages 439–448.

Yoshiki Niwa and Yoshihiko Nitta. 1994. Co-occurrence vectors from corpora vs. distance vec-tors from dictionaries. In Proceedings of the Inter-national Conference on Computational Linguistics(COLING). Kyoto, Japan, pages 304–309.

Elinor Ochs. 1993. Constructing social identity: A lan-guage socialization perspective. Research on lan-guage and social interaction 26(3):287–306.

Ellie Pavlick and Joel Tetreault. 2016. An empiri-cal analysis of formality in online communication.Transactions of the Association for ComputationalLinguistics (TACL) 4:61–74.

James W Pennebaker, Ryan L Boyd, Kayla Jordan, andKate Blackburn. 2015. The development and psy-chometric properties of LIWC2015. Technical re-port.

Vinodkumar Prabhakaran, Owen Rambow, and MonaDiab. 2012. Predicting overt display of power inwritten dialogs. In Proceedings of the North Amer-ican Chapter of the Association for ComputationalLinguistics (NAACL). pages 518–522.

Kristen Precht. 2003. Stance moods in spoken english:Evidentiality and affect in british and american con-versation. Text - Interdisciplinary Journal for theStudy of Discourse 23(2):239–258.

Kevin M Quinn, Burt L Monroe, Michael Colaresi,Michael H Crespin, and Dragomir R Radev. 2010.How to analyze political attention with minimal as-sumptions and costs. American Journal of PoliticalScience 54(1):209–228.

Ellen Riloff and Janyce Wiebe. 2003. Learning extrac-tion patterns for subjective expressions. In Proceed-ings of Empirical Methods for Natural LanguageProcessing (EMNLP). pages 105–112.

Yanchuan Sim, Brice Acree, Justin H Gross, andNoah A Smith. 2013. Measuring ideological pro-portions in political speeches. In Proceedings ofEmpirical Methods for Natural Language Process-ing (EMNLP).

Philip J. Stone. 1966. The General Inquirer: A Com-puter Approach to Content Analysis. The MITPress.

Chenhao Tan and Lillian Lee. 2015. All who wan-der: On the prevalence and characteristics of multi-community engagement. In Proceedings of the Con-ference on World-Wide Web (WWW). pages 1056–1066.

Yla R Tausczik and James W Pennebaker. 2010. Thepsychological meaning of words: LIWC and com-puterized text analysis methods. Journal of Lan-guage and Social Psychology 29(1):24–54.

Trang Tran and Mari Ostendorf. 2016. Characteriz-ing the language of online communities and its re-lation to community reception. In Proceedings of

Empirical Methods for Natural Language Process-ing (EMNLP).

Marilyn A Walker, Pranav Anand, Robert Abbott, andRicky Grant. 2012. Stance classification using di-alogic properties of persuasion. In Proceedingsof the North American Chapter of the Associationfor Computational Linguistics (NAACL). pages 592–596.

Robert Philip Weber. 1990. Basic content analysis. 49.Sage.

Peter RR White. 2003. Beyond modality and hedging:A dialogic view of the language of intersubjectivestance. Text - Interdisciplinary Journal for the Studyof Discourse 23(2):259–284.

Janyce Wiebe, Theresa Wilson, and Claire Cardie.2005. Annotating expressions of opinions and emo-tions in language. Language resources and evalua-tion 39(2):165–210.

Documents

A Multidimensional Lexicon for Interpersonal Stancetakingjeisenst/papers/pavalanathan... · A Multidimensional Lexicon for Interpersonal Stancetaking ... and power dynamics ... that