Measuring Polarization in High-Dimensional Data

Preview:

Citation preview

Measuring Polarization in High-Dimensional Data:Method and Application to Congressional Speech

Matthew Gentzkow, Stanford and NBERJesse M. Shapiro, Brown and NBERMatt Taddy, Microsoft and Chicago Booth

Tax Relief

DEATH TAX

freedom fighters

illegal alien

terrorists

ESTATE TAX

Tax Breaks

undocumented worker

Wealthiest

1 percent

living wage

fair labor

capitalist African American

Pro choice

equality

Tax Freedom

War on Terror

Pro life

Big Government

entrepreneurs

Right to life

Washington takeover

Welfare Queens

Origins

Dr. Frank I. Luntz – The Language of Healthcare 2009

1

THE LANGUAGE OF HEALTHCARE 2009

THE 10 RULES FOR STOPPING THE

“WASHINGTON TAKEOVER” OF HEALTHCARE

(1) Humanize your approach. Abandon and exile ALL references to the “healthcare

system.” From now on, healthcare is about people. Before you speak, think of the three

components of tone that matter most: Individualize. Personalize. Humanize.

(2) Acknowledge the “crisis” or suffer the consequences. If y

ou say there is no healthcare

crisis, you give your listener permission to ignore everything else you say. It is a

credibility killer for most Americans. A better approach is to define the crisis in your

terms. “If you’re one of the millions who can’t afford healthcare, it is a crisis.” Better

yet, “If some bureaucrat puts himself between you and your doctor, denying you

exactly what you need, that’s a crisis.” And the best: “If you have to wait weeks for

tests and months for treatment, that’s a healthcare crisis.”

(3) “Time” is the government healthcare killer. As Mick Jagger once sang, “Time is on

Your Side.” Nothing else turns people against the government takeover of healthcare

than the realistic expectation that it will result in delayed and potentially even denied

treatment, procedures and/or medications. “Waiting to buy a car or even a house won’t

kill you. But waiting for the healthcare you need – could. Delayed care is denied care.”

(4) The arguments against the Democrats’ healthcare plan must center around

“politicians,” “bureaucrats,” and “Washington” … not the free market, tax incentives,

or competition. Stop talking economic theory and start personalizing the impact of a

government takeover of healthcare. They don’t want to hear that you’re opposed to

government healthcare because it’s too expensive (any help from the government to

lower costs will be embraced) or because it’s anti-competitive (they don’t know about or

care about current limits to competition). But they are deathly afraid that a government

takeover will lower their quality of care – so they are extremely receptive to the anti-

Washington approach. It’s not an economic issue. It’s

a bureaucratic issue.

(5) The healthcare denial horror stories from Canada & Co. do resonate, but you have

to humanize them. You’ll notice we recommend the phrase “government takeover”

rather than “government run” or “government controlled” It’s because too many

politician say “we don’t want a government run healthcare system like Canada or Great

Britain” without explaining those consequences. There is a better approach. “In

countries with government run healthcare, politicians make YOUR healthcare decisions.

THEY decide if you’ll get the procedure you need, or if you are disqualified because the

treatment is too expensive or because you are too old. We can’t have that in America.”

Example: Social Security

• Luntz (2006):• “Never say ’privatization / private accounts.’ Instead say

’personalization / personal accounts.’ Two-thirds of America want topersonalize security while only one third would privatize it. Why?[Personalization] suggests ownership and control... while [privatization]suggests a profit motive and winners and losers.”

Example: Social Security

• 2005 Congress

Rep Dem“personal account” 184 48“private account” 5 542

• Media coverage, 6/23/05• “House GOP offers plan for Social Security; Bush’s private accounts

would be scaled back” (Washington Post)• “GOP backs use of Social Security surplus; Finds funding for personal

accounts” (Washington Times)

Example: Social Security

• 2005 Congress

Rep Dem“personal account” 184 48“private account” 5 542

• Media coverage, 6/23/05• “House GOP offers plan for Social Security; Bush’s private accounts

would be scaled back” (Washington Post)• “GOP backs use of Social Security surplus; Finds funding for personal

accounts” (Washington Times)

Is partisan speech a new phenomenon?

This Paper

• Goal: Measure trends in partisanship of political speech• Data: US Congressional Record, 1873-2009• Challenge: Speech is high-dimensional choice data

• Potential for severe finite-sample bias• Computation can be difficult

• Solution: Structural estimation with machine-learning methods• Approach exportable to other contexts (e.g. web browsing, residential

segregation)

Literature

• Polarization in Congress• E.g., Poole & Rosenthal (1984, 1997); McCarty et al. (2006)

• Polarization more broadly• E.g., Fiorina et al. (2006); Fiorina & Abrams (2006); Abramowitz & Saunders (2008)

• Congressional speech• E.g., Grimmer (2010, 2013); Quinn et al (2010)• Jensen et al (2012)

Data

Data

• US Congressional Record, 1872-2009• Use automated script to identify speaker and tag with metadata• Use some rules of thumb to remove procedural phrases

• “I yield the remainder of my time...”

• Turn into counts of two-word phrases less stems and stopwords• “war on terrorism” and “war on terror” become “war terror”

Trends in Verbosity

1880 1900 1920 1940 1960 1980 2000

5000

1000

015

000

Year

Tota

l utte

ranc

es p

er s

peak

er

Model

Statistical Model

• Vector of phrase counts cit for members i• Party affiliation P (i) ∈ {R,D}• Speaker characteristics xit

• Verbosity mit =∑

j cijt

• Assume throughout that

cit ∼ MN(

mit ,qP(i)t (xit)

)

Question

• How different are choices of R and D at each t?• Translation: how different are qR

t () and qDt ()?

• Approach: measure partisanship by diagnosticity• How much can I learn about your party from what you say?

Posteriors

• Posterior belief of an observer with a neutral prior after hearing phrase j

ρjt (x) =qR

jt (x)qR

jt (x) + qDjt (x)

• Posterior that the observer expects to assign to the speaker’s true party

πt (x) =12

qRt (x)′ · ρt (x) +

12

qDt (x)′ · (1− ρt (x))

Posteriors

• Posterior belief of an observer with a neutral prior after hearing phrase j

ρjt (x) =qR

jt (x)qR

jt (x) + qDjt (x)

• Posterior that the observer expects to assign to the speaker’s true party

πt (x) =12

qRt (x)′ · ρt (x) +

12

qDt (x)′ · (1− ρt (x))

Measure of Partisanship

πt =1Nt

∑i

πt (xit)

• Between 12 (speech uninformative) and 1 (speech fully revealing)

• Close cousin of isolation (White 1986, Cutler et al 1999)

Estimation

Plug-In Estimator

• Empirical analogues

q̂Pjt =

∑i∈P cijt∑i∈P mit

ρ̂jt =q̂R

jt

q̂Rjt + q̂D

jt

π̂PLUGINt =

12(q̂R

t)′ρ̂t +

12(q̂D

t)′(1− ρ̂t)

• This is the MLE when xit is constant• Consistent as quantity of speech grows large holding size of vocabulary

fixed

Maximum Likelihood EstimatorA

vera

ge p

artis

ansh

ip

1870 1890 1910 1930 1950 1970 1990 2010

0.54

0.56

0.58

0.60

0.62

0.64 real

Maximum Likelihood EstimatorA

vera

ge p

artis

ansh

ip

1870 1890 1910 1930 1950 1970 1990 2010

0.54

0.56

0.58

0.60

0.62

0.64 random real

Bias

E[(

q̂Rt)′ρ̂t −

(qR

t)′ρt

]=

(qR

t)′

E (ρ̂t − ρt) +

Cov[(

q̂Rt − qR

t)′, (ρ̂t − ρt)

]

• q̂Pt is unbiased for qP

t

• First term non-zero because ρ̂t is a non-linear function of q̂Pt

• Second term non-zero because ρ̂t is an increasing function of q̂Rt

Jensen et al. (2012)S

tand

ardi

zed

pola

rizat

ion

1870 1890 1910 1930 1950 1970 1990 2010

−1

0

1

2

3random real

Restrict to Commonly Occurring Phrases?

Top 90 percent of phrases

Ave

rage

par

tisan

ship

1870 1890 1910 1930 1950 1970 1990 2010

0.54

0.55

0.56

0.57

0.58

0.59

0.60 random real

Spoken more than 5 times

1870 1890 1910 1930 1950 1970 1990 2010

0.53

0.54

0.55

0.56

Top 50 percent of phrases

1870 1890 1910 1930 1950 1970 1990 2010

0.53

0.54

0.55

0.56

0.57

0.58

0.59Spoken more than 20 times

1870 1890 1910 1930 1950 1970 1990 2010

0.515

0.520

0.525

0.530

0.535

0.540

0.545

Top 10 percent of phrases

1870 1890 1910 1930 1950 1970 1990 2010

0.52

0.53

0.54

0.55

Spoken more than 100 times

1870 1890 1910 1930 1950 1970 1990 2010

0.510

0.515

0.520

0.525

0.530

0.535

Top 1 percent of phrases

1870 1890 1910 1930 1950 1970 1990 2010

0.510

0.515

0.520

0.525

0.530

0.535

Spoken more than 500 times

1870 1890 1910 1930 1950 1970 1990 2010

0.505

0.510

0.515

0.520

0.525

0.530

Leave-Out Estimator

• Define ρ̂−i,t which leaves out i• Define

π̂LOEt =

12

1|Rt |

∑i∈Rt

q̂′i,t · ρ̂−i,t +12

1|Dt |

∑i∈Dt

q̂′i,t ·(1− ρ̂−i,t

)• Enforces independence of q̂ and ρ̂• Still biased because of non-linear ρ̂

Leave-Out EstimatorA

vera

ge p

artis

ansh

ip

1870 1890 1910 1930 1950 1970 1990 2010

0.500

0.505

0.510

0.515

0.520

0.525random real

• Controlling bias• Add lasso type penalty to likelihood• Shrinks ρ̂jt toward 1

2

• Making computation feasible• Approximate likelihood with Poisson• Allows distributed computing (Taddy 2015)

• Controlling for confounds (xit )• geography, chamber, gender, indicator for being in majority party

Main Results

Baseline SpecificationA

vera

ge p

artis

ansh

ip

1870 1890 1910 1930 1950 1970 1990 2010

0.500

0.505

0.510

0.515

0.520

0.525 real random

MagnitudeE

xpec

ted

post

erio

r

0 20 40 60 80 100

0.5

0.6

0.7

0.8

0.9

1.0

Number of phrases

One minute of speech1873−1874

1989−19902007−2008

Comparison: Roll Call Votes

1870 1890 1910 1930 1950 1970 1990 2010

0.500

0.505

0.510

0.515

0.520

0.525

0.50

0.55

0.60

0.65

0.70

0.75

0.80

0.85average partisanship (speech)distance between parties (roll−call voting)

Comparison: Roll Call Votes

.44

.46

.48

.50

.52

.54

.56

−1.0 −0.5 0.0 0.5 1.0NOMINATE

Par

tisan

ship

Democrat

Republican

Unpacking Partisanship

Most Partisan Phrases

• Define the partisanship of phrase j in session t to be the effect on πt ofremoving phrase j from the vocabulary (redistributing probability mass toother phrases proportionally)• Let q̃P

kt equal qPkt/

(1− qP

jt)

if k 6= j and 0 otherwise• Recompute πt replacing qP

t with ~qPt and holding ρt constant

60th Congress (1907-08)

Most Republican Most Democraticinfantri war section cornerindian war ship subsidimount volunt republ panamafeet thenc level canalpostal save powder trustspain pay print paperwar pay lock canalfirst regiment bureau corporsoil survey senatori termnation forest remove wreck

60th Congress (1907-08)

Most Republican Most Democraticinfantri war section cornerindian war ship subsidimount volunt republ panamafeet thenc level canalpostal save powder trustspain pay print paperwar pay lock canalfirst regiment bureau corporsoil survey senatori termnation forest remove wreck

• 1908 Rep platform: Calls for “generous provision” for veterans of Spanish-American andIndian wars

60th Congress (1907-08)

Most Republican Most Democraticinfantri war section cornerindian war ship subsidimount volunt republ panamafeet thenc level canalpostal save powder trustspain pay print paperwar pay lock canalfirst regiment bureau corporsoil survey senatori termnation forest remove wreck

• 1908 Dem platform: “Free the Government from the grip of those who have made it abusiness asset of the favor-seeking corporations.”

• William Cox (D-IN): “the entire United States is now being held up by a great hydra-headedmonster, known in ordinary parlance as a ’powder trust’.”

80th Congress (1947-48)

Most Republican Most Democraticsteam plant admir denfeldcoast guard public busistop communism labor standarddepart agricultur intern laborlend leas tax refundzone germani concili servicebritish loan standard actapprov compact soil conservunit kingdom school lunchunion shop cent hour

80th Congress (1947-48)

Most Republican Most Democraticsteam plant admir denfeldcoast guard public busistop communism labor standarddepart agricultur intern laborlend leas tax refundzone germani concili servicebritish loan standard actapprov compact soil conservunit kingdom school lunchunion shop cent hour

• Aftermath of WWII

80th Congress (1947-48)

Most Republican Most Democraticsteam plant admir denfeldcoast guard public busistop communism labor standarddepart agricultur intern laborlend leas tax refundzone germani concili servicebritish loan standard actapprov compact soil conservunit kingdom school lunchunion shop cent hour

• 1948 Dem platform: Advocates amending Fair Labor Standards Act to raise the federalminimum wage to 75 cents per hour; also advocates school lunch program

100th Congress (1987-88)

Most Republican Most Democraticfreedom fighter star wardoubl breast contra aidabort industri nuclear weapondemand second contra warheifer tax support contrareserv object nuclear wastincom ballist agent orangcommunist govern central americanwithdraw reserv nicaraguan governabort demand hatian peopl

100th Congress (1987-88)

Most Republican Most Democraticfreedom fighter star wardoubl breast contra aidabort industri nuclear weapondemand second contra warheifer tax support contrareserv object nuclear wastincom ballist agent orangcommunist govern central americanwithdraw reserv nicaraguan governabort demand hatian peopl

• Debate over support for Contra rebels fighting Sandinista government in Nicaragua;Iran-Contra affair

100th Congress (1987-88)

Most Republican Most Democraticfreedom fighter star wardoubl breast contra aidabort industri nuclear weapondemand second contra warheifer tax support contrareserv object nuclear wastincom ballist agent orangcommunist govern central americanwithdraw reserv nicaraguan governabort demand hatian peopl

• Debate over Reagan’s “Star Wars” missile defense initiative & nuclear weapons policy

104th Congress (1995-96)

Most Republican Most Democraticmedic save tax breakpartialbirth abort nurs homebig govern comp timefeder debt break wealthitax increas break wealthiesttax relief communiti policterm limit million childrennation debt assault weapontax freedom deficit reductitem veto head start

104th Congress (1995-96)

Most Republican Most Democraticmedic save tax breakpartialbirth abort nurs homebig govern comp timefeder debt break wealthitax increas break wealthiesttax relief communiti policterm limit million childrennation debt assault weapontax freedom deficit reductitem veto head start

• Debate over taxes and fiscal policy; Republicans using language from Luntz memos andContract with America

Distribution of Phrase-Level PartisanshipP

oste

rior

1976 1980 1984 1988 1992 1996 2000 2004 2008

0.0

0.2

0.4

0.6

0.8

1.0

● ● ●

● 0.001 0.01 0.05 0.1 0.9 0.95 0.99 0.999

NeologismsA

vera

ge p

artis

ansh

ip

1870 1890 1910 1930 1950 1970 1990 2010

0.50

0.55

0.60

0.65

baselinepre−1980 vocabulary

post−1980 vocabulary

Topic Decomposition

• Are trends in partisanship driven by• Divergence in which topics Dems/Reps emphasize?• Divergence in how the parties talk about a given topic?

Topics

alcohol environment mailbudget federalism minorities

business foreign moneycrime government religion

defense health taxeconomy immigration tradeeducation justiceelections labor

Ave

rage

par

tisan

ship

1870 1890 1910 1930 1950 1970 1990 2010

0.500

0.505

0.510

0.515

0.520

0.525

0.530

overallwithin

between

0.50

0.52

0.54

0.56

0.58

0.60

alcohol

1870 1890 1910 1930 1950 1970 1990 2010

Avg

. par

tisan

ship

0.000

0.001

Fre

q.

0.50

0.52

0.54

0.56

0.58

0.60

defense

1870 1890 1910 1930 1950 1970 1990 2010

Avg

. par

tisan

ship

0.000

0.027

Fre

q.

0.50

0.52

0.54

0.56

0.58

0.60

minorities

1870 1890 1910 1930 1950 1970 1990 2010

Avg

. par

tisan

ship

0.000

0.009

Fre

q.

0.50

0.52

0.54

0.56

0.58

0.60

budget

1870 1890 1910 1930 1950 1970 1990 2010

Avg

. par

tisan

ship

0.000

0.015

Fre

q.

0.50

0.52

0.54

0.56

0.58

0.60

crime

1870 1890 1910 1930 1950 1970 1990 2010

Avg

. par

tisan

ship

0.000

0.005

Fre

q.

0.50

0.52

0.54

0.56

0.58

0.60

government

1870 1890 1910 1930 1950 1970 1990 2010

Avg

. par

tisan

ship

0.000

0.018

Fre

q.

0.50

0.52

0.54

0.56

0.58

0.60

health

1870 1890 1910 1930 1950 1970 1990 2010

Avg

. par

tisan

ship

0.000

0.016

Fre

q.

0.50

0.52

0.54

0.56

0.58

0.60

immigration

1870 1890 1910 1930 1950 1970 1990 2010

Avg

. par

tisan

ship

0.000

0.002

Fre

q.

0.50

0.52

0.54

0.56

0.58

0.60

tax

1870 1890 1910 1930 1950 1970 1990 2010

Avg

. par

tisan

ship

0.000

0.011

Fre

q.

Individual Tax PhrasesP

oste

rior

prob

abili

ty th

at s

peak

er is

Rep

ublic

an

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1870 1878 1950 1958 1966 1974 1982 1990 1998 2006

● ● ● ●

● ● ● ●

●●

●●

●death taxtax break

tax spendtax loophol

tax reliefclose tax

tax freedomshare tax

Explanations

Political Innovation

• Contract with America (1994)

• Republicans take control of Congress for first time since 1952• Frank Luntz: novel polling techniques, memos to Republican candidates• In the aftermath, Democrats launch an effort to improve their own choice of

language

You believe language can change a paradigm? “I don’t believe it – Iknow it. I’ve seen it with my own eyes...I watched in 1994 when the groupof Republicans got together and said: ‘We’re going to do this completelydifferently than it’s ever been done before.’...Every politician and everypolitical party issues a platform, but only these people signed a contract.” -Luntz (2004)

“Republican framing superiority had played a major role in their takeover ofCongress in 1994. I and others had hoped that... a widespreadunderstanding of how framing worked would allow Democrats to reversethe trend.” - Lakoff (2014)

Phrases from CWA0.

500.

510.

520.

530.

54

1870 1890 1910 1930 1950 1970 1990 2010

Avg

. par

tisan

ship

Contract with America

0.000

0.018

Fre

q.

Broader Context

• Party discipline in speech• Democratic Message Board (1989-1991)• Republican Theme Team (1991-1993): “develop ideas and phrases to be

used by all Republicans”

• Changing media environment• 1979: C-SPAN (House of Representatives)• 1983: C-SPAN2 (Senate)

“When asked whether he would be the Republican leader without C-SPAN,Gingrich... [replied] ‘No’... C-SPAN provided a group of media-savvy Houseconservatives in the mid-1980s with a method of... winning a prime-timeaudience.” (Frantzich & Sullivan 1996)

SummaryA

vera

ge p

artis

ansh

ip

1976 1980 1984 1988 1992 1996 2000 2004 2008

0.500

0.505

0.510

0.515

0.520

0.525 C−SPAN C−SPAN2 Contract with America

Ford Carter Reagan Bush Clinton Bush

Conclusion

Does Language Matter?

• Partisan language in Congress diffuses to broader public• Gentzkow & Shapiro 2010; Martin & Yurukoglu 2016; Greenstein & Zhu 2012

• Issue framing affects public opinion• Lathrop 2003; Graetz and Shapiro 2006; Druckman et al. 2013

• Language affects group identity• Kinzler et al 2007, Clots-Figueas and Masella 2013

• "Human beings do not live in the objective world alone, nor alone in the world ofsocial activity as ordinarily understood, but are very much at the mercy of theparticular language which has become the medium of expression.” (Sapir 1954)

• "When we successfully reframe public discourse, we change the way the publicsees the world. We change what counts as common sense.... Thinkingdifferently requires speaking differently.” (Lakoff 2014)

Recommended