33
Measuring the Happiness of Large-Scale Written Expression: Songs, Blogs, and Presidents Author: Peter Sheridan Dodds and Christopher M. Danforth Department of Mathematics and Statistics University of Vermont, Burlington, VT,USA 1

Measuring the happiness of large scale written expression harsh

Embed Size (px)

DESCRIPTION

CNIS Presentation

Citation preview

Page 1: Measuring the happiness of large scale written expression harsh

Measuring the Happiness of Large-Scale Written Expression:

Songs, Blogs, and Presidents

Author: Peter Sheridan Dodds and Christopher M. Danforth

Department of Mathematics and Statistics University of Vermont, Burlington, VT,USA

1

Page 2: Measuring the happiness of large scale written expression harsh

Introduction

• Our Desires – What we want in life?• Indices of well-being – Bhutan’s National

Happiness Index• Pluto , Bentham Stuart Mill’s Hedonistic

Calculus – codify collective happiness maximization as determinant of moral actions.

• Jefferson in US Declaration of Independence – “Life ,Liberty and the pursuit of Happiness”.

2

Page 3: Measuring the happiness of large scale written expression harsh

Measurement of Happiness

• Goal : Devise a transparent, population-level hedonometer to remotely sense and quantify emotional levels, post hoc or in real time.

• Technique : On Large Scale text use human evaluations of emotional content of individual words within given text to generate overall score for that text.

• Use Affective Norms for English Words(ANEW)

3

Page 4: Measuring the happiness of large scale written expression harsh

ANEW Study

• Participants graded their reaction to set of 1034 words wrt 3 standard semantics differentials– Good - Bad (Psychological valence)– Active – Passive (Arousal)– Strong – Weak (Dominance)

• On a 1-9 point scale with half integer increments.

• concentrate on Psychological valence scale.

4

Page 5: Measuring the happiness of large scale written expression harsh

Psychological Valence Scale

• Termed as Happy –Unhappy Scale.• At one extreme of this scale you are – Happy– Pleased– Satisfied– Contended– Hopeful

5

Page 6: Measuring the happiness of large scale written expression harsh

Psychological Valence Scale

• At other end of the scale you feel completely– Unhappy– Annoyed– Unsatisfied– Melancholic– Despaired– Bored

• Average psychological valence scores for ANEW study words as measure of average happiness experience by reader.

6

Page 7: Measuring the happiness of large scale written expression harsh

Estimate overall valence of text (Vtext)

• Determine the frequency fi that i th word from the ANEW study word appears in the text.

• Compute a weighted average of the valence of ANEW study words as

• Vi is the ANEW study’s recorded average valence for word i.

7

Page 8: Measuring the happiness of large scale written expression harsh

Examples to calculate valence of text

• Pangram ‘‘The quick brown fox jumps over the lazy dog.’’

• The three underlined words appear in the ANEW study word list with average valences 6.64, 4.38, and 7.57,respectively.

• Overall valence score • Vtext= 1/3*(1* 6.64 + 1* 4.38 + 1* 7.57)= 6.20

8

Page 9: Measuring the happiness of large scale written expression harsh

Evaluation of valence for a song, album and artist.

9

Page 10: Measuring the happiness of large scale written expression harsh

Why use Large-Scale Texts?

• Method does not account for the meaning of words in combination.

• Sophisticated natural language parsing algorithms cannot be trusted for small scale texts.

• An individual expression is too variable and must be views over long time scales.

• We need to collect and rapidly analyze large corpuses for statistical assessment.

10

Page 11: Measuring the happiness of large scale written expression harsh

Description of Large-Scale Texts Studied

– Song Lyrics, Song titles– Blog sentences written in first person and

containing the word “feel”– State of Union addresses

11

Page 12: Measuring the happiness of large scale written expression harsh

12

Page 13: Measuring the happiness of large scale written expression harsh

Data Sources• Downloaded lyrics between 1960 and 2007 from

http://www.hotlyrics.net tagged them with release year and genre using Compact Disc Data Base available at http://www.freedb.org

• http://www.freedb.org -database of song titles and genre classifications

• From August, 2005, first person sentences using the word feel (or a conjugated form) were extracted from 2.3 million blogs(9,113,772 sentences) along with demographic data and made available through http://www.wefeelfine.org

• State of the Union messages from American Presidency Project at http://www.presidency.ucsb.edu

• http://www.natcorp.ox.ac.uk – British National Corpus13

Page 14: Measuring the happiness of large scale written expression harsh

14

Page 15: Measuring the happiness of large scale written expression harsh

Inferences from Plots

• Normalized abundances of ANEW study words in various corpuses, as function of their average valence.

• Insets for each plot shows same distributions but normalized by underlying frequency distribution of ANEW words.

• Insets reveal - Song lyrics are weighted towards high valence words, and mode bin is 8–9.

• Blogs reveal - more low valence words resulting in a bimodal distribution, and the mode bin is again 8–9.

• State of the Union addresses - high valence words in the 7–8 bin and show less negativity than blogs.

15

Page 16: Measuring the happiness of large scale written expression harsh

Results

16

• Average valence of lyrics declines from 1961 to 2007. Decline is strongest until 1985 and level off after 1995.•“Love” is most frequent word in song Lyrics and its high valence has impact on overall average valence.

Page 17: Measuring the happiness of large scale written expression harsh

Analyze using Valence Shift Word Graph

• Compare individual word prevalence changes in lyrics before and after 1980.

• Rank words by their descending absolute contribution to change in average valence between the two eras, δ.

• Word i’s contribution depends on its change in relative frequency, and its valence relative to the pre- 1980 era average.

17

Page 18: Measuring the happiness of large scale written expression harsh

• In comparing some text b wrt to given text a, we define valence difference as δ(b,a)= vb – va

• The percentage contribution to this difference by word i as

• where pi,a and pi,b are the fractional abundances of word i in texts a and b.

• ΣΔi(b,a) over all i gives +100% or -100% depending on whether δ(b,a) is positive or negative.

• Ranking words using definition of Δi(b,a) gives 18

Page 19: Measuring the happiness of large scale written expression harsh

19

Page 20: Measuring the happiness of large scale written expression harsh

• Decrease in average valence for lyrics after 1980 is due to loss of positive words such as ‘love’, ‘baby’, and ‘home’ (italicized and red) and a gain in negative words such as ‘hate’, ‘pain’, and ‘death’ (normal font and blue).

• These drops are countered by trends of less ‘lonely’ and ‘sad’, and more ‘life’ and ‘god’.

• The former dominates the latter and average valence decreases from ≈ 6.4 to 6.1.

• Lets check valence time series for music genres

20

Valence Shift Word Graph Explained

Page 21: Measuring the happiness of large scale written expression harsh

21

• Valence of individual genres is stable over time, with only rock showing a minor decrease. • The ordering of genres by measured average valence is sensible: gospel and soulare at the top while several subgenres of rock including metal and punk, which emerged through the 1970s exhibit much lower valences.

Page 22: Measuring the happiness of large scale written expression harsh

• Valence (rock and pop)>Valence(Rap and hip-hop), two notable genres that appear halfway through the time series > valence( metal and punk).

• The decline in overall valence does not occur within particular genres, but in evolutionary appearance of new genres that accessed more negative emotional niches.

22

Page 23: Measuring the happiness of large scale written expression harsh

Measure Valence in Real Time (Blogs)

23

Page 24: Measuring the happiness of large scale written expression harsh

• Valence rising from average of 5.75 to over 6.0 over last part of year within individual years.

• In 2008, after a midyear dip, due to the economic recession, valence notably peaks in last part of year - correlate with US presidential election.– Christmas Day – Valentine’s Day– fifth anniversary of WTC ;Sept 11, 2006 – Pentagon attacks in the USA; Sept 10, 2007 – US Presidential Election; Nov 4, 2008– US Presidential Inauguration; Jan 20, 2009– Day of Michael Jackson’s death; June 25, 2009

24

Page 25: Measuring the happiness of large scale written expression harsh

25

Page 26: Measuring the happiness of large scale written expression harsh

Human response via Blogs Explained

• -ve words driving down the average valence of the 5th anniv of 9/11 attacks are ‘lost’, ‘anger’, ‘hate’, and ‘tragedy’ (‘terrorist’ ranks 10th in valence shift) + decrease in frequency of ‘love’ and ‘happy’.

• ‘love’ and ‘people’ are more prevalent, ‘hate’ and ‘pain’ less on Valentine’s Day. Counteract by increase in ‘sad’ ,‘lonely’ and ‘bored’.

• More ‘proud’, ‘hope’ and ‘win’ increase valence, less appearances of ‘pain’, ‘sad’ and ‘guilty’.

26

Page 27: Measuring the happiness of large scale written expression harsh

Blog Demographics & Contextual Info.

27

Page 28: Measuring the happiness of large scale written expression harsh

Blog Demographics & Contextual Info.

• 9A Plot:- Average valence of blog sentences follows a pronounced single maximum, convex curve as a function of age.

• 13-14 year-olds produce lowest average valence sentences (5.58). As age increases, valence rises until leveling off near 6.0 for ages45–60, then begins to trend downwards.

• 75 to 84 age range people produce sentences with valence similar to those of 17 year olds.

• Expected :- changes in income (rising) and health (eventually declining) . How it affects valence?

28

Page 29: Measuring the happiness of large scale written expression harsh

Blog Demographics & Contextual Info.

• 9b Plot:- Average valence of blog sentences as function of absolute latitude

• Average valence ranges from 5.71 (for 0–11.5) up to 5.83 (for 29.5–44.5) and then back to 5.78 (for 52.5–69.5).

• US has the highest average valence (5.83) followed by Canada (5.78), the United Kingdom (5.77), and Australia (5.74).

• 9c Plot:- Sun lag effect causes high valence - Mon.

29

Page 30: Measuring the happiness of large scale written expression harsh

Gender Comparison

• males exhibit essentially the same average valence as females (5.89 vs. 5.91)

• females use the most impactful high and low valence words separating the two genders: ‘love’, ‘baby’, ‘loved’ and ‘happy’ on the +ve end, and ‘hurt’, ‘hate’, ‘sad’, and ‘alone’ on the -ve end.

• the only word used more frequently by males is the rather perfunctory word ‘good.’

30

Page 31: Measuring the happiness of large scale written expression harsh

State of Union Addresses(SOTU)

31

Page 32: Measuring the happiness of large scale written expression harsh

Historical events resonant with SOTU

• highest average valence scores are Kennedy (6.41), Eisenhower(6.38), and Reagan (6.38).

• low valence starting with First World War• Wilson’s speeches steeply drop from an initial

6.58 in 1913 to 5.88 in 1920• Coolidge’s addresses provide exception• Hoover’s low average due to his speech in 1930,

the stock market crash of October 29,1929 marked the beginning of the Great Depression; speeches burdened with ‘depression’, ‘debt’, ‘crisis’, and ‘failure.’

32

Page 33: Measuring the happiness of large scale written expression harsh

Historical events resonant with SOTU

• The years before and during the American Civil War form a local minimum in valence corresponding to Buchanan and Lincoln

• a drop from Eisenhower and Kennedy’s level to that of Johnson(6.08), the latter’s first SOTU speech coming just seven weeks after the assassination of Kennedy, and the remainder through the heightening Vietnam War

• Then valence decreases till date.33