Measuring the happiness of large scale written expression harsh

Preview:

DESCRIPTION

CNIS Presentation

Citation preview

Measuring the Happiness of Large-Scale Written Expression:

Songs, Blogs, and Presidents

Author: Peter Sheridan Dodds and Christopher M. Danforth

Department of Mathematics and Statistics University of Vermont, Burlington, VT,USA

1

Introduction

• Our Desires – What we want in life?• Indices of well-being – Bhutan’s National

Happiness Index• Pluto , Bentham Stuart Mill’s Hedonistic

Calculus – codify collective happiness maximization as determinant of moral actions.

• Jefferson in US Declaration of Independence – “Life ,Liberty and the pursuit of Happiness”.

2

Measurement of Happiness

• Goal : Devise a transparent, population-level hedonometer to remotely sense and quantify emotional levels, post hoc or in real time.

• Technique : On Large Scale text use human evaluations of emotional content of individual words within given text to generate overall score for that text.

• Use Affective Norms for English Words(ANEW)

3

ANEW Study

• Participants graded their reaction to set of 1034 words wrt 3 standard semantics differentials– Good - Bad (Psychological valence)– Active – Passive (Arousal)– Strong – Weak (Dominance)

• On a 1-9 point scale with half integer increments.

• concentrate on Psychological valence scale.

4

Psychological Valence Scale

• Termed as Happy –Unhappy Scale.• At one extreme of this scale you are – Happy– Pleased– Satisfied– Contended– Hopeful

5

Psychological Valence Scale

• At other end of the scale you feel completely– Unhappy– Annoyed– Unsatisfied– Melancholic– Despaired– Bored

• Average psychological valence scores for ANEW study words as measure of average happiness experience by reader.

6

Estimate overall valence of text (Vtext)

• Determine the frequency fi that i th word from the ANEW study word appears in the text.

• Compute a weighted average of the valence of ANEW study words as

• Vi is the ANEW study’s recorded average valence for word i.

7

Examples to calculate valence of text

• Pangram ‘‘The quick brown fox jumps over the lazy dog.’’

• The three underlined words appear in the ANEW study word list with average valences 6.64, 4.38, and 7.57,respectively.

• Overall valence score • Vtext= 1/3*(1* 6.64 + 1* 4.38 + 1* 7.57)= 6.20

8

Evaluation of valence for a song, album and artist.

9

Why use Large-Scale Texts?

• Method does not account for the meaning of words in combination.

• Sophisticated natural language parsing algorithms cannot be trusted for small scale texts.

• An individual expression is too variable and must be views over long time scales.

• We need to collect and rapidly analyze large corpuses for statistical assessment.

10

Description of Large-Scale Texts Studied

– Song Lyrics, Song titles– Blog sentences written in first person and

containing the word “feel”– State of Union addresses

11

12

Data Sources• Downloaded lyrics between 1960 and 2007 from

http://www.hotlyrics.net tagged them with release year and genre using Compact Disc Data Base available at http://www.freedb.org

• http://www.freedb.org -database of song titles and genre classifications

• From August, 2005, first person sentences using the word feel (or a conjugated form) were extracted from 2.3 million blogs(9,113,772 sentences) along with demographic data and made available through http://www.wefeelfine.org

• State of the Union messages from American Presidency Project at http://www.presidency.ucsb.edu

• http://www.natcorp.ox.ac.uk – British National Corpus13

14

Inferences from Plots

• Normalized abundances of ANEW study words in various corpuses, as function of their average valence.

• Insets for each plot shows same distributions but normalized by underlying frequency distribution of ANEW words.

• Insets reveal - Song lyrics are weighted towards high valence words, and mode bin is 8–9.

• Blogs reveal - more low valence words resulting in a bimodal distribution, and the mode bin is again 8–9.

• State of the Union addresses - high valence words in the 7–8 bin and show less negativity than blogs.

15

Results

16

• Average valence of lyrics declines from 1961 to 2007. Decline is strongest until 1985 and level off after 1995.•“Love” is most frequent word in song Lyrics and its high valence has impact on overall average valence.

Analyze using Valence Shift Word Graph

• Compare individual word prevalence changes in lyrics before and after 1980.

• Rank words by their descending absolute contribution to change in average valence between the two eras, δ.

• Word i’s contribution depends on its change in relative frequency, and its valence relative to the pre- 1980 era average.

17

• In comparing some text b wrt to given text a, we define valence difference as δ(b,a)= vb – va

• The percentage contribution to this difference by word i as

• where pi,a and pi,b are the fractional abundances of word i in texts a and b.

• ΣΔi(b,a) over all i gives +100% or -100% depending on whether δ(b,a) is positive or negative.

• Ranking words using definition of Δi(b,a) gives 18

19

• Decrease in average valence for lyrics after 1980 is due to loss of positive words such as ‘love’, ‘baby’, and ‘home’ (italicized and red) and a gain in negative words such as ‘hate’, ‘pain’, and ‘death’ (normal font and blue).

• These drops are countered by trends of less ‘lonely’ and ‘sad’, and more ‘life’ and ‘god’.

• The former dominates the latter and average valence decreases from ≈ 6.4 to 6.1.

• Lets check valence time series for music genres

20

Valence Shift Word Graph Explained

21

• Valence of individual genres is stable over time, with only rock showing a minor decrease. • The ordering of genres by measured average valence is sensible: gospel and soulare at the top while several subgenres of rock including metal and punk, which emerged through the 1970s exhibit much lower valences.

• Valence (rock and pop)>Valence(Rap and hip-hop), two notable genres that appear halfway through the time series > valence( metal and punk).

• The decline in overall valence does not occur within particular genres, but in evolutionary appearance of new genres that accessed more negative emotional niches.

22

Measure Valence in Real Time (Blogs)

23

• Valence rising from average of 5.75 to over 6.0 over last part of year within individual years.

• In 2008, after a midyear dip, due to the economic recession, valence notably peaks in last part of year - correlate with US presidential election.– Christmas Day – Valentine’s Day– fifth anniversary of WTC ;Sept 11, 2006 – Pentagon attacks in the USA; Sept 10, 2007 – US Presidential Election; Nov 4, 2008– US Presidential Inauguration; Jan 20, 2009– Day of Michael Jackson’s death; June 25, 2009

24

25

Human response via Blogs Explained

• -ve words driving down the average valence of the 5th anniv of 9/11 attacks are ‘lost’, ‘anger’, ‘hate’, and ‘tragedy’ (‘terrorist’ ranks 10th in valence shift) + decrease in frequency of ‘love’ and ‘happy’.

• ‘love’ and ‘people’ are more prevalent, ‘hate’ and ‘pain’ less on Valentine’s Day. Counteract by increase in ‘sad’ ,‘lonely’ and ‘bored’.

• More ‘proud’, ‘hope’ and ‘win’ increase valence, less appearances of ‘pain’, ‘sad’ and ‘guilty’.

26

Blog Demographics & Contextual Info.

27

Blog Demographics & Contextual Info.

• 9A Plot:- Average valence of blog sentences follows a pronounced single maximum, convex curve as a function of age.

• 13-14 year-olds produce lowest average valence sentences (5.58). As age increases, valence rises until leveling off near 6.0 for ages45–60, then begins to trend downwards.

• 75 to 84 age range people produce sentences with valence similar to those of 17 year olds.

• Expected :- changes in income (rising) and health (eventually declining) . How it affects valence?

28

Blog Demographics & Contextual Info.

• 9b Plot:- Average valence of blog sentences as function of absolute latitude

• Average valence ranges from 5.71 (for 0–11.5) up to 5.83 (for 29.5–44.5) and then back to 5.78 (for 52.5–69.5).

• US has the highest average valence (5.83) followed by Canada (5.78), the United Kingdom (5.77), and Australia (5.74).

• 9c Plot:- Sun lag effect causes high valence - Mon.

29

Gender Comparison

• males exhibit essentially the same average valence as females (5.89 vs. 5.91)

• females use the most impactful high and low valence words separating the two genders: ‘love’, ‘baby’, ‘loved’ and ‘happy’ on the +ve end, and ‘hurt’, ‘hate’, ‘sad’, and ‘alone’ on the -ve end.

• the only word used more frequently by males is the rather perfunctory word ‘good.’

30

State of Union Addresses(SOTU)

31

Historical events resonant with SOTU

• highest average valence scores are Kennedy (6.41), Eisenhower(6.38), and Reagan (6.38).

• low valence starting with First World War• Wilson’s speeches steeply drop from an initial

6.58 in 1913 to 5.88 in 1920• Coolidge’s addresses provide exception• Hoover’s low average due to his speech in 1930,

the stock market crash of October 29,1929 marked the beginning of the Great Depression; speeches burdened with ‘depression’, ‘debt’, ‘crisis’, and ‘failure.’

32

Historical events resonant with SOTU

• The years before and during the American Civil War form a local minimum in valence corresponding to Buchanan and Lincoln

• a drop from Eisenhower and Kennedy’s level to that of Johnson(6.08), the latter’s first SOTU speech coming just seven weeks after the assassination of Kennedy, and the remainder through the heightening Vietnam War

• Then valence decreases till date.33

Recommended