View
222
Download
0
Category
Preview:
Citation preview
8/8/2019 Language and Political Ideology
1/30
Nisnevich 1
Language and Political Ideology:
A Comparison of the Word Choice of Liberal and Conservative American Columnists
Alex Nisnevich
Ling 55AC
Professor Houser
Fall 2010
8/8/2019 Language and Political Ideology
2/30
8/8/2019 Language and Political Ideology
3/30
Nisnevich 3
Introduction
The topic of this study is the connection between language and thought, and
particularly how language is used and chosen, even subconsciously, both as a political tool
and a marker of political identity. To test the relationship between political persuasion and
language, word frequencies of a wide range of opinion pieces by American liberal and
conservative writers are analyzed to determine if there is a statistically significant
difference in the choice of words by liberals and conservatives. This data is then used to
build a computer algorithm that tries to determine the political persuasion of documents
based on their word frequencies, and the remainder of the study consists of tests run with
this algorithm.
If an effective algorithm can be constructed that accurately determines the political
persuasion of a document based on its word use, then this is a very significant result for
linguistics, because it conclusively demonstrates that American liberals and American
conservatives use language in a way that can be modeled and predicted. In this way, a
strong connection between language and political beliefs, and on a broader level between
language and thought, can be shown.
My working hypothesis is that there is a statistically significant correlation between
political identification and word choice and that it is theoretically possible for a computer
model to predict political identification based on a word-frequency analysis of a document
with a reasonable degree of accuracy. However, I still believe that personal ideology does
not entirely determine language, and so I dontexpect the algorithm I develop to be any
more than 80-90% accurate in pinpointing the ideology of well-known writers.
8/8/2019 Language and Political Ideology
4/30
Nisnevich 4
Literature Review
The connection between language use and political ideology is loosely related to the
theory of linguistic relativity, which states that people of different linguistic communities
think differently because their language offers them different ways of expressing the world
around them (Kramsch 11). While the linguistic relativity hypothesis, or Sapir-Whorf
hypothesis, only makes a statement about different languages as they relate to different
world-views, it is conceivable that the same language can be used to encode different world-
views for different ideological groups, and this is related to linguistic relativity in that both
stem from the fact that signs, despite generally being arbitrary in form, are non-arbitrary in
their use.
The relationship between language and politics in particular was first explored by
George Orwell, in his aptly-titled 1946 essay Politics and the English Language. Orwell
alludes to the linguistic relativity hypothesis by writing, If thought corrupts language,
language can also corrupt thought (Orwell 7), and theorizes that political writing is driven
by what he described as meaningless words, that is, words, such as fascism,
democracy, freedom, patriotic, and justice, that have many different meanings for
different people but are used in an empty and dishonest way (Orwell 4). Orwells essay
suggests that such meaningless words tend to be used in political writing of all
persuasions, due to their variable meaning. Several years later, Orwell went on to brilliantly
describe how language can affect the thoughts of people in a repressive regime in his novel
Nineteen Eighty-Four.
Noam Chomsky frequently discussed the connection between language and politics,
and his view was that political language consisted of mere terms of propaganda such as
the free world and the national interest (Chomsky 472). Like Orwell, Chomsky believes
8/8/2019 Language and Political Ideology
5/30
Nisnevich 5
that language is often abused to enforce ideological goals (Chomsky 472). In Chomskys
view, the primary persuasive power of language was in the connotations of words. As an
example of the significance of connotations, he describes the time when Nicaragua was
painted negatively by the American media due to its plan to purchase MiG fighter planes,
when Nicaragua would have been perfectly happy to buy French Mirages if they had been
allowed to. In the eyes of the media and the public, the MiG had heavily Soviet (and thus
very negative) connotations, while the Mirage, despite being essentially identical, had no
negative connotations, and Chomsky attests that the MiG purchase was heavily played up by
the hawkish media to provoke public opinion against Nicaragua over a triviality (Chomsky
610-611).
Orwell and Chomsky both discussed how political language can influence and
persuade, but George Lakoff took the discussion in a different direction by theorizing that
political opinions could be correlated to different internal metaphors about government
that people subscribe to. In his view, conservatives tend to envision government under a
strict father model where citizens are disciplined into being responsible, moral adults and
then left alone (Lakoff 65), while liberals tend to envision government under a nurturant
parent model where essentially good citizens are kept away from corrupting influences
(Lakoff 108). In Lakoffs view, it is metaphor a form of language that largely drives
political opinions, not the other way around. Originally I had hoped to be able to empirically
test Lakoffs theories, but since they concern feelings on a deep conceptual level, it is
unlikely that a correlation could be made to word frequencies in documents.
More recently, with the rise of interactivity over the Internet, the connection
between language and politics has become a focus of some websites that track the
popularity of different words among different political groups. For instance, a CNN site
8/8/2019 Language and Political Ideology
6/30
Nisnevich 6
launched this year allows users to submit a brief statement of their beliefs, and then
measures overall word frequency clouds, as well as frequency clouds for Democrats and for
Republicans (see Figures 1-3). It is important to note, however, that the results of this
survey will likely differ significantly from the results of my study, due to the vastly different
surveyed population (namely, the general public as opposed to political columnists).
Figure 1. CNN's most popular words (independent of ideology)
Figure 2. CNN's most popular words (Democrats) Figure 3. CNN's most common words (Republicans)
Source for Fig. 1-3: http://www.cnn.com/interactive/2010/10/politics/ireport.elex.project/?hpt=C1
8/8/2019 Language and Political Ideology
7/30
Nisnevich 7
Component I: Frequency Analysis of Liberal and Conservative Articles
Methodology
I chose to break up this study into two components. In the first part, I conducted a
frequency analysis of 50 liberal and 50 conservative editorials (10 editorials by 10 writers
each) to determine if certain words were used significantly more by liberal or conservative
columnists. In the second part, I attempted to build an algorithm to accurately determine an
articles political ideology using the frequencies of certain words and the dataset produced
earlier (see Component II).
The first step of frequency analysis was the selection of authors and articles. To
select five liberal and five conservative authors, I looked for columnists who satisfied the
following seven criteria:
1. Living American print columnist2. Self-described liberal/conservative3. Considered by readers and the general public to be liberal/conservative4. Preferably not too close to the center5. Regular or semi-regular columns, with the last one written no more than a
few months ago
6. Columns are broadly about politics, and are not solely focused on anyparticular policy area, such as foreign policy or economics.
7. Reasonably well-knownThe reason for such a stringent selection process was to eliminate as many sources
of bias as possible. I eliminated from the running writers who were self-described centrists,
libertarians, or populists, or whose public perception did not match their self-described
8/8/2019 Language and Political Ideology
8/30
Nisnevich 8
ideology, in order to be able to classify the writers into two distinct categories liberal and
conservative with no overlap. Furthermore, I only looked for active writers and avoided
writers who only focused on a single policy area, so that the articles would all have a
relatively similar general focus and time period. This selection process gave me the
following writers:
Liberal
Jonathan Alter (Newsweek) Maureen Dowd (New York Times) Paul Krugman (New York Times) Robert Reich (aggregated on
RealClearPolitics)
Frank Rich (New York Times)
Conservative
Pat Buchanan (syndicated) David Horowitz (FrontPage Magazine) Charles Krauthammer (Washington
Post)
Michelle Malkin (syndicated) George Will (Washington Post)
Once I had my ten writers, selection of articles to examine proved not to be difficult.
With the exception of David Horowitz, all of these writers had their own entries on the news
aggregator site RealClearPolitics.com, so I was able to see all of their most recent articles. In
David Horowitzs case, I simply looked at the latest articles that he wrote for FrontPage
Magazine. In general, I tried to use the ten most recent articles by each writer, though in a
few cases I had to skip over some articles in the event that they clearly had nothing to do
with government or politics. Including such articles would only have confused the data,
since different fields use different jargon. (For a full list of URLs of articles used, see
Appendix II.)
To count word frequencies within each document, I wrote a PHP script (see
count.php in Appendix I for source code). The script used the following rules:
8/8/2019 Language and Political Ideology
9/30
Nisnevich 9
Remove all punctuation characters before counting Ignore articles, copulas, prepositions, conjunctions, connectors, and
pronouns.
Count words twice if they are in the article title or subtitle (on the groundsthat these are words that the writer must have deemed especially
important)
Ignore words that capitalized >50% of the time in an article (to throw outproper nouns)
The decision to throw out proper nouns was a difficult one to make, but ultimately I
decided to do so on the grounds that counting proper nouns would simply make it too easy
to find differences in language use. For instance, only a liberal article would likely mention
F.D.R. or Kennedy, and only a conservative article would likely talk about Communists. Even
barring such extreme examples, proper nouns would tend to point more clearly than other
words toward a particular ideology, and so I made the decision to ignore words that are
primarily capitalized in an article.
The script integrated into a database and kept running totals of word frequencies by
author and by ideology, so I was able to simply run the script on 100 articles and receive the
cumulative word frequency per author and per ideology. Total word counts were also
returned, which were necessary for the next step.
After I obtained the results for all 100 articles, I calculated the proportional usage of
each word by each writer by dividing the word frequency by the total word count, and
performed the same calculation for the running liberal totals and conservative totals.
Finally, for each word I subtracted its conservative proportional usage from its liberal
proportional usage, obtaining a number I called the bias. A positive bias thus meant that a
8/8/2019 Language and Political Ideology
10/30
Nisnevich 10
word was used more often by liberal writers, and a negative bias meant that a word was
used more often by conservative writers. Sorting the table by bias enabled me to finally see
the words that are used significantly more often by liberal columnists and by conservative
columnists.
As one final step, I selected 15 liberal key words and 15 conservative key words
from the top and bottom of the table, respectively. I took the most biased words that
satisfied two criteria:
Related in at least some way to politics or government (that is, not a wordlike not or did)
For a liberal word, at least 3 of the 5 liberal columnists use it more oftenthan the average conservative columnist, and vice versa. (This was a
necessary criteria to avoid words that are used frequently but only by one or
two columnists and thus are poor reprentatives of words used by columnists
in general.)
Results and Analysis
Tables 1-3 show the 50 most frequently used words, the 30 most liberally biased
words, and the 30 most conservatively biased words, respectively. Liberal key words are
written in bold blue, and conservative key words are written in bold red. (Note that all
values given are proportional, so, for instance, if a word with a liberal-avg value of 0.001,
it is 1 out of every 1,000 words written by liberal columnists on average.)
Table 1. Most frequently used words (regardless of ideology)
word liberal-avg conservative-avg liberal bias
not 0.004418 0.005734 -0.001317
have 0.004097 0.004716 -0.000619
8/8/2019 Language and Political Ideology
11/30
Nisnevich 11
has 0.003329 0.003179 0.000150
will 0.003223 0.002909 0.000314
would 0.002390 0.002576 -0.000186
had 0.002369 0.002389 -0.000020
government 0.001601 0.001724 -0.000124
years 0.001451 0.001828 -0.000377
president 0.002241 0.000997 0.001244
political 0.001387 0.001621 -0.000233
do 0.001366 0.001558 -0.000192
war 0.000918 0.001828 -0.000911
percent 0.001686 0.000935 0.000751
new 0.001409 0.001163 0.000245
can 0.001707 0.000831 0.000876
should 0.001409 0.001060 0.000349
tax 0.001793 0.000644 0.001149
time 0.001409 0.001018 0.000390people 0.001216 0.001163 0.000053
just 0.001387 0.000935 0.000452
said 0.001280 0.000935 0.000346
economy 0.001537 0.000603 0.000934
get 0.001430 0.000623 0.000807
could 0.001067 0.000977 0.000091
own 0.001195 0.000831 0.000364
public 0.001110 0.000852 0.000258
last 0.001323 0.000623 0.000700
did 0.000726 0.001184 -0.000459there 0.000619 0.001288 -0.000669
country 0.000704 0.001143 -0.000438
election 0.000854 0.000977 -0.000123
two 0.000854 0.000977 -0.000123
economic 0.001366 0.000395 0.000971
back 0.001195 0.000540 0.000655
world 0.000683 0.001039 -0.000356
money 0.001003 0.000686 0.000317
academic 0.000000 0.001662 -0.001662
year 0.001088 0.000561 0.000527
policy 0.001024 0.000603 0.000422
health 0.000960 0.000623 0.000337
why 0.000896 0.000686 0.000211
right 0.001024 0.000540 0.000484
way 0.000854 0.000561 0.000293
top 0.001195 0.000208 0.000987
cuts 0.001174 0.000208 0.000966
8/8/2019 Language and Political Ideology
12/30
Nisnevich 12
might 0.001067 0.000312 0.000755
don't 0.000982 0.000395 0.000587
may 0.000811 0.000540 0.000271
same 0.000768 0.000582 0.000187
Table 2. Most liberally biased words
word
liberal-
total
conservative-
total liberal bias
president 0.002241 0.000997 0.001244
tax 0.001793 0.000644 0.001149
top 0.001195 0.000208 0.000987
economic 0.001366 0.000395 0.000971
cuts 0.001174 0.000208 0.000966
big 0.001152 0.000187 0.000965
economy 0.001537 0.000603 0.000934can 0.001707 0.000831 0.000876
get 0.001430 0.000623 0.000807
can't 0.001024 0.000229 0.000796
might 0.001067 0.000312 0.000755
percent 0.001686 0.000935 0.000751
last 0.001323 0.000623 0.000700
plan 0.000790 0.000104 0.000686
won't 0.000726 0.000042 0.000684
back 0.001195 0.000540 0.000655
jobs 0.000896 0.000249 0.000647know 0.000875 0.000229 0.000646
debt 0.000960 0.000353 0.000607
don't 0.000982 0.000395 0.000587
spending 0.000918 0.000332 0.000585
deficit 0.000726 0.000145 0.000580
year 0.001088 0.000561 0.000527
cut 0.000726 0.000208 0.000518
unemployment 0.000704 0.000187 0.000517
costs 0.000598 0.000083 0.000514
business 0.000683 0.000187 0.000496income 0.000662 0.000166 0.000495
didn't 0.000619 0.000125 0.000494
right 0.001024 0.000540 0.000484
Table 3. Most conservatively biased words
word
liberal-
total
conservative-
total liberal bias
academic 0.000000 0.001662 -0.001662
not 0.004418 0.005734 -0.001317
students 0.000000 0.001184 -0.001184
freedom 0.000000 0.001039 -0.001039
war 0.000918 0.001828 -0.000911
left 0.000171 0.001039 -0.000868
liberal 0.000171 0.000977 -0.000806university 0.000000 0.000769 -0.000769
there 0.000619 0.001288 -0.000669
faculty 0.000000 0.000665 -0.000665
have 0.004097 0.004716 -0.000619
hearings 0.000021 0.000623 -0.000602
state 0.000277 0.000873 -0.000595
treaty 0.000043 0.000603 -0.000560
radical 0.000149 0.000706 -0.000557
women 0.000192 0.000748 -0.000556
nuclear 0.000021 0.000561 -0.000540says 0.000299 0.000810 -0.000512
such 0.000405 0.000914 -0.000509
security 0.000043 0.000519 -0.000477
did 0.000726 0.001184 -0.000459
professors 0.000000 0.000457 -0.000457
social 0.000299 0.000748 -0.000449
country 0.000704 0.001143 -0.000438
unions 0.000000 0.000436 -0.000436
left-wing 0.000021 0.000436 -0.000415
states 0.000107 0.000519 -0.000413members 0.000149 0.000561 -0.000412
course 0.000384 0.000790 -0.000405
illegal 0.000000 0.000395 -0.000395
8/8/2019 Language and Political Ideology
13/30
Nisnevich 13
Table 4 displays the 15 key liberal words and 15key conservative words that
were obtained from the above data:
Table 4. Key liberal and conservative words
# Liberal Conservative
1 president students
2 tax freedom
3 economic war
4 cuts left
5 economy liberal
6 plan state
7 jobs treaty
8 spending radical9 deficit women
10 cut nuclear
11 unemployment security
12 costs social
13 business country
14 income unions
15 financial states
The most liberally biased word, or the word that liberal columnists used the most in
comparison to conservative columnists, is president, which may stem from the fact that
liberal articles tended to refer to Barack Obama as President Obama, while conservative
articles generally did not (this trend would likely be the opposite when a Republican
president is in power). Other than this, the liberally biased words tended to relate to the
economy (tax, economic, cuts, etc), while the conservatively biased words tended to
relate to foreign policy (war, treaty, nuclear), to liberalism (left, liberal), and to
nationalism (state, country, states). Comparing these results with those of the CNN
iReport (see page 5) shows that both datasets put government as the most popular
politics-related word, but other than that there is very little agreement, which, as mentioned
8/8/2019 Language and Political Ideology
14/30
Nisnevich 14
before, could be ascribed to the two different populations studied. The fact that Paul
Krugman is an economist as well as columnist may have contributed to the high usage of
economic terms by the liberal columnists, though it couldnt have been the only factor, since
almost all of the liberal columnists (with the occasional exceptions of Alter and/or Dowd)
showed relatively high frequencies of use for the economic terms.
Applying the theories of Orwell and Chomsky, there is certainly a prevalence of what
Orwell described as meaningless words in the dataset in particular, the word freedom,
the second most conservatively biased word is one that both Orwell and Chomsky pointed
out has no inherent meaning due to the many conflicting connotations that it could have. To
a lesser extent, this could be said for a great many number of words on these lists: as
Chomsky would argue, these frequency lists show that the principal goal of political
language, even in private newspapers such as the New York Times and the Washington Post,
is to persuade, and persuasion is accomplished with the aid of imprecise language that
avoids the need for concrete details.
In this part of the project I demonstrated that there are significant differences in
word choice between liberal and conservative columnists, but I had not yet determined how
predictable these differences were. This is the topic that I addressed in the second part of
the project.
Component II: Prediction of Ideology from Word Frequencies
Methodology
Suppose that an article is presented that is written by either a liberal columnist or a
conservative columnist, but no other information is given aside from the text of the article
8/8/2019 Language and Political Ideology
15/30
Nisnevich 15
itself. My goal in this part of the project was to write an algorithm that tried to predict
whether a given article was liberal or conservative, based on the frequency of certain
words. I made use of the key liberal and key conservative words that I found in the first part
of this project and decided that the algorithm would test each of the 30 key words,
determining the likelihood of the article being liberal or conservative based on the
frequency of each key word and then adding the results together.
More precisely, the script that I wrote (see guess.php in Appendix I for source code)
functioned as follows:
I. For each of the 30 key words,1. Find the frequency of the key word in the given article.2. Determine how many of the 10 writers tested in Component 1 used
this word at least as often on average as in the given article.
3.a. If there are some writers who used the word this often, find the
percentage of them who are liberal. This is the percentage
chance that the article has a liberal bias in terms of word
frequency, based on the data from just this word.
b. If there are no writers who used the word this often, then thisarticle is either very liberal (if its a liberal key word) or very
conservative (if its a conservative key word). Thus, give it either
a 100% chance of being liberal or a 0% chance, appropriately,
based on the data from just this word.
4. Subtract 50% from this percentage to obtain the bias number. Ifthe bias is positive, the article is likely to be liberal based on this
8/8/2019 Language and Political Ideology
16/30
Nisnevich 16
word (+0.5 bias equates to 100% chance of being liberal). If the bias
is negative, the article is likely to be conservative based on this word
( -0.5 bias equates to 100% chance of being conservative).
II. Finally, take the 30 bias numbers that are calculated (one for each keyword), and add them together to obtain the overall bias number for the
article. In theory, this bias could range from -7.5 to +7.5, and in practice it
has ranged between -4.45 (almost certainly conservative) to +6.5 (almost
certainly liberal).
From here, my work on the project consisted of testing this algorithm, the results of
which appear in the next section.
Results and Analysis
To test the results of the algorithm, I ran it on each of the 100 articles I had looked at
previously. Table 5 below shows the resulting bias score for each of the ten articles (in
order according to the list in Appendix I) by each of the ten writers, as well as an average
score for each writer. Figure 4 below is a bar graph of the resulting average bias scores for
each writer.
Table 5. Bias scores for each of the tested articles, and average scores per writer
Alter Dowd Krugman Reich Rich Buchanan Horowitz Krauthammer Malkin W
+1.00 +0.33 +3.67 +6.50 +1.32 +1.00 -3.50 -1.00 -0.83 -0.2
+1.92 +1.00 +2.50 +3.50 +2.76 -2.33 -1.25 -0.50 +1.50 -1.0
+1.00 +1.42 +2.92 +4.50 +3.50 -3.50 -2.25 -1.08 -2.25 +0.7+0.00 +0.00 +2.00 +4.00 +1.31 +1.17 -3.00 -0.25 +1.00 +1.5
+2.50 +2.50 +3.50 +1.00 +0.06 -0.83 -4.23 -0.50 +0.75 +2.0
+3.60 -1.50 +2.50 +1.83 +2.67 +1.67 -3.00 +0.25 -1.00 +0.0
-1.50 -0.33 +2.67 +2.50 +3.10 -1.83 -4.22 -1.00 +0.75 +0.0
-1.00 -0.50 +2.17 +2.50 +1.55 +1.00 -2.00 +1.00 -2.25 +0.0
+2.75 +0.67 +3.33 +3.50 +3.50 -1.00 -2.78 +0.00 -2.50 -1.5
8/8/2019 Language and Political Ideology
17/30
Nisnevich 17
Figure 4. Average bias scores for each columnist
As can be seen, the results are somewhat promising but not as clear-cut as Id hoped
they would be. While all but one liberal columnist has a score above +1.00, only one
conservative columnist Horowitz managed a score below -1.00, with the rest floating
close to the +0.00 line. Most troubling, the algorithm gave George Will a liberal bias score
thats statistically indistinguishable from that of Maureen Dowd, despite their nearly
diametrically opposite views. Looking at the table shows that the bias scores given jump
all over the place even for articles written by the same author, apparently with minor
fluctuations in word choice from article to article having huge repercussions. All told, 6
liberal articles were miscategorized as conservative and 17 conservative articles were
miscategorized as liberal, for a total of 23 mistakes or a 77% success rate, which is about
what I expected the algorithm to be able to achieve.
-4.00 -3.00 -2.00 -1.00 +0.00 +1.00 +2.00 +3.00 +4.00
Alter
Dowd
Krugman
Reich
Rich
Buchanan
Horowitz
Krauthammer
Malkin
Will
+1.00 +0.25 +2.42 +1.25 -2.33 +0.08 -4.45 -0.58 -2.25 +1.0
AVG +1.13 +0.38 +2.77 +3.11 +1.74 -0.46 -3.07 -0.37 -0.71 +0.2
8/8/2019 Language and Political Ideology
18/30
Nisnevich 18
Why isnt the algorithm able to give more accurate results? I believe that the issue is
that, while the differences in word choice between liberal and conservative columnists can
be demonstrated (see Component I), they cannot necessarily be predicted in advance:
individual authors will differ from each other in different and possibly unexpected ways,
and word frequency count is simply not precise enough a measurement to be able to
achieve a near-perfect accuracy.
The fact that my algorithm could achieve 77% accuracy with such a simple
algorithm, however, does seem to provide further evidence for the interrelation between
political ideology and language. Nevertheless, this accuracy rating is somewhat suspect,
since the algorithm was applied on the very same articles that were used to determine how
it operates, which could introduce some bias. Further testing is needed to ascertain how
useful this algorithm or its variations could be on an unpredictable sample set.
Conclusion
Ultimately, both of my hypotheses came out as expected: I demonstrated in
Component I a statistically significant correlation between political identification and word
choice, but only managed to achieve 77% accuracy in predicting political persuasion based
on word choice. As mentioned above, this suggests that the differences in word choice
between liberal and conservative columnists, while present, are somewhat unpredictable.
Furthermore, I believe that personal ideology does not entirely determine language use,
though it is a large contributor, so it would make sense that perfect accuracy is impossible
in this context. Finally, there is no such thing as a simple duality of political beliefs, and
there is huge variety in what one can believe in even if one identifies as liberal or as
conservative. In light of this, the results can be interpreted to mean that, while the 5
8/8/2019 Language and Political Ideology
19/30
Nisnevich 19
liberal authors can be shown to write significantly differently from the 5 conservative
authors, theres nothing close complete agreement in word choice among the liberal
authors and among the conservative authors. As the cultural linguistic concept of agency
makes clear, membership in a group contributes to identity but does not establish identity
(Bucholtz 422).
If I had to do this paper again, I think the one thing that I would definitely try to do
differently would be to include more authors, since only comparing five liberal columnists
and five conservative columnists involves such a small sample size that unexpected bias can
easily be introduced as a result. However, significantly increasing the sample size would
come at a cost, as it would also make the project much more tedious and time-consuming. I
am also interested in what would have happened if I hadnt removed proper nouns from the
comparison: I still think that it was the right thing for my project to ignore proper nouns,
but the results would certainly be different and notable in their own way if I had chosen to
keep them in.
Accepting the necessary failings of any attempt to definitively classify writing by
political persuasion, there are still many useful avenues of research for this topic. In
particular, if the linguistic bias detection algorithm is made a little more robust, there are
many sources of data that it could examine and questions that it could consider: Do
columnists who self-identify as nonpartisan still have a significant conscious or
subconscious linguistic bias? Are presidential speeches written to appeal more to partisans
or to centrists, judging by the bias in their word choice? How do political blog articles
compare to editorials in terms of bias? These and many other questions could be
investigated with such an algorithm, and could lead to interesting results.
8/8/2019 Language and Political Ideology
20/30
Nisnevich 20
Works Cited
Buckoltz, Mary. "Language, Gender, and Sexuality." Language in the USA: Themes for the
Twenty-first Century. By Edward Finegan and John R. Rickford. Cambridge:
Cambridge UP, 2004. Print.
Chomsky, Noam, and Carlos Peregrn Otero. Language and Politics. Oakland, CA: AK, 2004.
Print.
"IReport Election Project." CNN.com. Cable News Network, 27 Oct. 2010. Web. 2 Dec. 2010.
Kramsch, Claire J. Language and Culture. Oxford, OX: Oxford UP, 1998. Print.
Lakoff, George. Moral Politics: How Liberals and Conservatives Think. Chicago: Univ. of
Chicago, 2001. Print.
Orwell, George. "Politics and the English Language." Horizons 1946. Web.
8/8/2019 Language and Political Ideology
21/30
Nisnevich 21
Appendix I: Source Code
count.php
8/8/2019 Language and Political Ideology
22/30
Nisnevich 22
{// Increment total word count$wordCount++;
// Ignore words in ignore listif (!in_array ($word, $ignoreList)) {
// If the word is uppercase, make lowercase and also add touppercase frequency table
if (ctype_upper(substr($word, 0, 1))) {$word = strtolower($word);array_key_exists( $word, $uppercase ) ? $uppercase[
$word ]++ : $uppercase[ $word ] = 1;}
// For each word found in the frequency table, incrementits value by one
array_key_exists( $word, $freqData ) ? $freqData[ $word ]++: $freqData[ $word ] = 1;
}
}
// Insert a "Total" entry for total word count$freqData['#TOTAL#'] = $wordCount;
// Now, we must insert results into the db
// First, add db column for this authoradd_column_if_not_exist ("research_ling_polwordfreq", $author);
// If a word is uppercase more than 50% of the time, ignore it// Otherwise, add it to two columns: one for the author and one for theideologyforeach ( $freqData as $word => $freq) {
if (!array_key_exists ($word, $uppercase) || $freq > (2 *$uppercase[$word])) {
$query = "INSERT INTO `research_ling_polwordfreq` (`word`,`$author`, `$ideology-total`) VALUES('$word', $freq, $freq) ONDUPLICATE KEY UPDATE `$author` = `$author` + $freq, `$ideology-total` =`$ideology-total` + $freq";
$result = mysql_query ($query);echo "$query ... $result
";
}}echo "DONE";
// Function for adding a column only if it doesn't already exist
function add_column_if_not_exist($db, $column, $column_attr = "INT NOTNULL" ){
$exists = false;$columns = mysql_query("show columns from $db");while($c = mysql_fetch_assoc($columns)){
if($c['Field'] == $column){$exists = true;break;
}}
8/8/2019 Language and Political Ideology
23/30
Nisnevich 23
if(!$exists){$query = "ALTER TABLE `$db` ADD `$column`
$column_attr";$result = mysql_query($query);echo "$query ...
$result
";
}}
?>
guess.php
8/8/2019 Language and Political Ideology
24/30
Nisnevich 24
array(0.000128, 0.000124, 0.002339, 0.003072, 0.001303),array(0.000850, 0.000111, 0.000537, 0.000283, 0.000541));
$bias += test_word ("cuts", 1, $freqData, $wordCount,array(0.000639, 0.000248, 0.001969, 0.001469, 0.001368),array(0.000728, 0.000000, 0.000000, 0.000425, 0.000135));
$bias += test_word ("economy", 1, $freqData, $wordCount,array(0.000895, 0.000248, 0.003200, 0.004408, 0.000261),array(0.001214, 0.000111, 0.000939, 0.000425, 0.000946));
$bias += test_word ("plan", 1, $freqData, $wordCount,array(0.001278, 0.000743, 0.001723, 0.000401, 0.000261),array(0.000121, 0.000000, 0.000000, 0.000566, 0.000000));
$bias += test_word ("jobs", 1, $freqData, $wordCount,array(0.000511, 0.000248, 0.000615, 0.001870, 0.001107),array(0.000243, 0.000111, 0.000000, 0.000425, 0.000676));
$bias += test_word ("spending", 1, $freqData, $wordCount,array(0.000383, 0.000619, 0.001600, 0.001603, 0.000651),array(0.000486, 0.000000, 0.000134, 0.000283, 0.001216));
$bias += test_word ("deficit", 1, $freqData, $wordCount,array(0.000128, 0.000124, 0.001723, 0.001870, 0.000261),array(0.000607, 0.000000, 0.000000, 0.000000, 0.000270));
$bias += test_word ("cut", 1, $freqData, $wordCount,array(0.000383, 0.000000, 0.000369, 0.002271, 0.000717),array(0.000607, 0.000000, 0.000134, 0.000425, 0.000135));
$bias += test_word ("unemployment", 1, $freqData, $wordCount,array(0.000383, 0.000000, 0.001600, 0.001202, 0.000521),array(0.000243, 0.000000, 0.000537, 0.000142, 0.000270));
$bias += test_word ("costs", 1, $freqData, $wordCount,array(0.001534, 0.000000, 0.000615, 0.001336, 0.000065),array(0.000000, 0.000000, 0.000000, 0.000425, 0.000135));
$bias += test_word ("business", 1, $freqData, $wordCount,array(0.000383, 0.000124, 0.000123, 0.002004, 0.000782),array(0.000121, 0.000111, 0.000134, 0.000708, 0.000000));
$bias += test_word ("income", 1, $freqData, $wordCount,array(0.000000, 0.000000, 0.000492, 0.002004, 0.000782),array(0.000121, 0.000000, 0.000134, 0.000849, 0.000000));
$bias += test_word ("financial", 1, $freqData, $wordCount,array(0.000767, 0.000248, 0.000739, 0.000267, 0.001172),array(0.000243, 0.000056, 0.000671, 0.000425, 0.000135));
$bias += test_word ("states", 0, $freqData, $wordCount,array(0.000256, 0.000000, 0.000000, 0.000267, 0.000065),array(0.000364, 0.000111, 0.000402, 0.001840, 0.000541));
$bias += test_word ("unions", 0, $freqData, $wordCount,array(0.000000, 0.000000, 0.000000, 0.000000, 0.000000),array(0.000000, 0.000612, 0.000000, 0.000849, 0.000541));
$bias += test_word ("country", 0, $freqData, $wordCount,array(0.000895, 0.000867, 0.000492, 0.000000, 0.000977),
array(0.002064, 0.001113, 0.001342, 0.000566, 0.000541));$bias += test_word ("social", 0, $freqData, $wordCount,
array(0.000000, 0.000619, 0.000123, 0.000000, 0.000521),array(0.000728, 0.000779, 0.000402, 0.000708, 0.001081));
$bias += test_word ("security", 0, $freqData, $wordCount,array(0.000000, 0.000000, 0.000123, 0.000134, 0.000000),array(0.000607, 0.000167, 0.000805, 0.001274, 0.000270));
$bias += test_word ("nuclear", 0, $freqData, $wordCount,array(0.000000, 0.000000, 0.000000, 0.000000, 0.000065),array(0.001700, 0.000056, 0.000939, 0.000000, 0.000676));
8/8/2019 Language and Political Ideology
25/30
Nisnevich 25
$bias += test_word ("women", 0, $freqData, $wordCount,array(0.000000, 0.000867, 0.000123, 0.000134, 0.000000),array(0.000121, 0.001502, 0.000268, 0.000566, 0.000270));
$bias += test_word ("radical", 0, $freqData, $wordCount,array(0.000000, 0.000248, 0.000123, 0.000000, 0.000261),array(0.000121, 0.001613, 0.000268, 0.000283, 0.000000));
$bias += test_word ("treaty", 0, $freqData, $wordCount,array(0.000000, 0.000000, 0.000246, 0.000000, 0.000000),array(0.001821, 0.000000, 0.000805, 0.000000, 0.001081));
$bias += test_word ("state", 0, $freqData, $wordCount,array(0.000256, 0.000372, 0.000246, 0.000534, 0.000130),array(0.000850, 0.000668, 0.000671, 0.001274, 0.001216));
$bias += test_word ("liberal", 0, $freqData, $wordCount,array(0.000256, 0.000124, 0.000000, 0.000134, 0.000261),array(0.000607, 0.001335, 0.000537, 0.001415, 0.000541));
$bias += test_word ("left", 0, $freqData, $wordCount,array(0.000128, 0.000248, 0.000246, 0.000000, 0.000195),array(0.000607, 0.002059, 0.000671, 0.000425, 0.000000));
$bias += test_word ("war", 0, $freqData, $wordCount,array(0.000895, 0.000743, 0.000000, 0.000000, 0.001954),
array(0.004128, 0.001892, 0.001744, 0.000708, 0.000270));$bias += test_word ("freedom", 0, $freqData, $wordCount,
array(0.000000, 0.000000, 0.000000, 0.000000, 0.000000),array(0.000243, 0.002615, 0.000000, 0.000142, 0.000000));
$bias += test_word ("students", 0, $freqData, $wordCount,array(0.000000, 0.000000, 0.000000, 0.000000, 0.000000),array(0.000121, 0.002782, 0.000000, 0.000849, 0.000000));
// compares a key word frequency with that of the 5 liberal and 5conservative writers,// and returns a corresponding bias score (higher = more liberal, lower= more conservative)function test_word ($word, $is_lib_keyword, $freqData, $wordCount,$lib_freqs, $con_freqs) {
if (array_key_exists($word, $freqData)) {$freq = $freqData[$word] / $wordCount;
} else {$freq = 0;
}
$libs = 0;$cons = 0;foreach ($lib_freqs as $lib_f) {
if ($lib_f >= $freq) {$libs++;
}
}foreach ($con_freqs as $con_f) {
if ($con_f >= $freq) {$cons++;
}}
if ($libs == 0 && $cons == 0) {// either "more liberal" or "more conservative" than any of
the columnists,
8/8/2019 Language and Political Ideology
26/30
Nisnevich 26
// so we'll give it either a +0.5 or a -0.5 (maximum for asingle test)
return ($is_lib_keyword - .5);} else {
// What is the chance that an article using the given wordthis many times is liberal?
// Find the proportion of writers with this word freq orhigher that are liberal.
// (Then subtract 50% to get a score between +0.5 and -0.5)return ($libs / ($libs + $cons) - .5);
}}
echo ("On a scale of -7.5 (most conservative) to +7.5 (most liberal),this text has $bias bias.");
?>
8/8/2019 Language and Political Ideology
27/30
8/8/2019 Language and Political Ideology
28/30
Nisnevich 28
o http://www.sfgate.com/cgi-bin/article.cgi?f=/c/a/2010/11/07/IN4J1G5VII.DTL
o http://online.wsj.com/article/SB10001424052702304173704575578200086257706.html
o http://www.salon.com/news/politics/2010_elections/index.html?story=/news/feature/2010/10/25/why_democrats_move_to_the_center
o http://www.huffingtonpost.com/robert-reich/the-secret-bigmoney-takeo_b_754938.html
o http://www.sfgate.com/cgi-bin/article.cgi?f=/c/a/2010/10/03/INC41FL1DM.DTL
o http://www.huffingtonpost.com/robert-reich/republican-economics-as-s_b_739654.html
o http://www.sfgate.com/cgi-bin/article.cgi?f=/c/a/2010/09/26/INTI1FHHQQ.DTL
o http://www.salon.com/news/feature/2010/09/21/stimulus_not_enough/index.html
Frank Richo http://www.nytimes.com/2010/11/28/opinion/28rich.htmlo http://www.nytimes.com/2010/11/14/opinion/14rich.htmlo http://www.nytimes.com/2010/11/07/opinion/07rich.html?_r=1o http://www.nytimes.com/2010/10/24/opinion/24rich.html?_r=1&ref=opi
nion
o http://www.nytimes.com/2010/10/10/opinion/10rich.html?_r=1&ref=opinion
o http://www.nytimes.com/2010/10/03/opinion/03rich.html?_r=1o http://www.nytimes.com/2010/09/12/opinion/12rich.html?_r=1&ref=opi
niono http://www.nytimes.com/2010/08/29/opinion/29rich.html?_r=1&ref=opi
nion
o
http://www.nytimes.com/2010/08/08/opinion/08rich.html?_r=1&ref=opiniono http://www.nytimes.com/2010/08/01/opinion/01rich.html?_r=1
Conservative Columnists
Pat Buchanano http://www.realclearpolitics.com/articles/2010/11/30/european_union_ri
p_108087.html
o http://www.realclearpolitics.com/articles/2010/11/26/why_are_we_still_in_korea_108069.html
o http://www.realclearpolitics.com/articles/2010/11/23/is_gop_risking_a_new_cold_war_108035.html
o http://www.realclearpolitics.com/articles/2010/11/19/who_fed_the_tiger_108001.html
o http://www.realclearpolitics.com/articles/2010/11/16/tea_partys_winning_hand_107963.html
8/8/2019 Language and Political Ideology
29/30
Nisnevich 29
o http://www.realclearpolitics.com/articles/2010/11/12/the_fed_trashes_the_dollar_107928.html
o http://www.realclearpolitics.com/articles/2010/11/09/the_murderers_of_christianity_107884.html
o http://www.realclearpolitics.com/articles/2010/11/05/has_history_passed_obama_by_107847.html
o http://www.realclearpolitics.com/articles/2010/11/02/broders_brainstorm_107802.html
o http://www.realclearpolitics.com/articles/2010/10/29/we_are_in_uncharted_waters.html
David Horowitzo http://archive.frontpagemag.com/readArticle.aspx?ARTID=36385o http://archive.frontpagemag.com/readArticle.aspx?ARTID=36267o http://archive.frontpagemag.com/readArticle.aspx?ARTID=36236o http://archive.frontpagemag.com/readArticle.aspx?ARTID=36189o http://archive.frontpagemag.com/readArticle.aspx?ARTID=35156o http://archive.frontpagemag.com/readArticle.aspx?ARTID=35117o http://archive.frontpagemag.com/readArticle.aspx?ARTID=34689o http://archive.frontpagemag.com/readArticle.aspx?ARTID=34836o http://archive.frontpagemag.com/readArticle.aspx?ARTID=34790o http://archive.frontpagemag.com/readArticle.aspx?ARTID=34348
Charles Krauthammero http://www.washingtonpost.com/wp-
dyn/content/article/2010/11/25/AR2010112502232.htmlo http://www.washingtonpost.com/wp-
dyn/content/article/2010/11/18/AR2010111804494.html
o http://www.nationalreview.com/articles/253121/why-obama-right-about-india-charles-krauthammer
o http://www.washingtonpost.com/wp-dyn/content/article/2010/11/04/AR2010110406581.htmlo http://www.washingtonpost.com/wp-dyn/content/article/2010/10/28/AR2010102806270.html
o http://www.washingtonpost.com/wp-dyn/content/article/2010/10/21/AR2010102104856.html?hpid=opinions
box1
o http://www.washingtonpost.com/wp-dyn/content/article/2010/10/14/AR2010101405234.html
o http://articles.ocregister.com/2010-10-07/opinion/24649228_1_debt-problem-national-debt-democrats
o http://www.nationalreview.com/articles/248433/why-he-sending-them-charles-krauthammer
o http://www.washingtonpost.com/wp-dyn/content/article/2010/09/23/AR2010092304746.html Michelle Malkin
o http://www.realclearpolitics.com/articles/2010/12/01/the_littlest_victims_of_obamacare_108102.html
o http://www.realclearpolitics.com/articles/2010/11/24/giving_thanks_for_american_ingenuity_108048.html
o http://www.realclearpolitics.com/articles/2010/11/19/ray_lahood_obamas_power-mad_cell_phone_czar_108007.html
8/8/2019 Language and Political Ideology
30/30
Nisnevich 30
o http://www.realclearpolitics.com/articles/2010/11/19/dude_wheres_my_obamacare_waiver_107978.html
o http://www.realclearpolitics.com/articles/2010/11/12/throw_carol_browner_under_the_bus_107934.html
o http://www.realclearpolitics.com/articles/2010/11/10/no_illegal_alien_pilot_left_behind_107899.html
o http://www.realclearpolitics.com/articles/2010/11/05/voters_speak_no_to_soak-the-rich_schemes_107848.html
o http://www.realclearpolitics.com/articles/2010/10/29/standing_tall_the_rise_and_resilience_of_conservative_women_107768.html
o http://www.realclearpolitics.com/articles/2010/10/27/the_lefts_voter_fraud_whitewash_107740.html
o http://www.realclearpolitics.com/articles/2010/10/22/free_the_taxpayers_defund_state-sponsored_media_107676.html
George Willo http://www.washingtonpost.com/wp-
dyn/content/article/2010/12/01/AR2010120104728.html
o http://www.washingtonpost.com/wp-dyn/content/article/2010/11/26/AR2010112603490.html
o http://www.washingtonpost.com/wp-dyn/content/article/2010/11/24/AR2010112405841.html
o http://www.newsweek.com/2010/11/20/will-a-senator-looks-back-to-the-future.html
o http://www.washingtonpost.com/wp-dyn/content/article/2010/11/17/AR2010111705316.html
o http://www.pittsburghlive.com/x/pittsburghtrib/opinion/s_709095.htmlo http://www.washingtonpost.com/wp-
dyn/content/article/2010/11/12/AR2010111204494.htmlo http://www.washingtonpost.com/wp-
dyn/content/article/2010/11/10/AR2010111005499.htmlo http://www.washingtonpost.com/wp-dyn/content/article/2010/11/03/AR2010110303844.html
o http://www.pittsburghlive.com/x/pittsburghtrib/opinion/s_706364.html
Recommended