34
MORE THAN WORDS: QUANTIFYING LANGUAGE TO MEASURE FIRMS’ FUNDAMENTALS JOURNAL OF FINANCE, 2008 CONVERSATIONS ON FINANCE PRESENTED BY: SAHITHI GADDAM | UDIT GUPTA | JOHN LIU | BEEJAL SHAH

Sentiment Analysis

Embed Size (px)

Citation preview

Page 1: Sentiment Analysis

MORE THAN WORDS:QUANTIFYING LANGUAGE TO MEASURE

FIRMS’ FUNDAMENTALSJOURNAL OF FINANCE, 2008

CONVERSATIONS ON FINANCE

PRESENTED BY:

SAHITHI GADDAM | UDIT GUPTA | JOHN LIU | BEEJAL SHAH

Page 2: Sentiment Analysis

SHOULD THIS IMPACT STOCK PRICE?

Source: Wall Street Journal (Oct 23, 2014)

Page 3: Sentiment Analysis

SHOULD THIS IMPACT STOCK PRICE?

Source: Factiva.

Page 4: Sentiment Analysis

AGENDA

Part I.

Motivation For The Study

Part II.

Base Paper - Overview

Case Study

Principal Idea Explored

Testing For Predictability Power

Conclusion

Part III.

Discussion on Present Scenario

Page 5: Sentiment Analysis

MOTIVATION FOR THE STUDY

(1/2)

Efficient Markets claim

Firm’s Value = Expected [Present Value (Cash Flows)]

Conditional ‘Expectation’ based on Investor’s Information Set

Investor’s Information Set = Quantitative + Qualitative

Abundant literature studying Quantitative information

However, substantial stock price movements are not explained

by quantitative measures (of firm’s fundamentals)

Qualitative information may help explain stock returns

Firm’s business environment, operations and prospects etc.

Page 6: Sentiment Analysis

MOTIVATION FOR THE STUDY

(2/2)

Possible advantages from quantifying language

1) Allows researchers to study the impact of limitless variety of

events (e.g. the Microsoft case)

2) May have incremental explanatory power for future earnings

and returns

If analysts’ forecasts and accounting variables are

incomplete or biased

Using Negative vs. Positive words

Literature in psychology

‘Negative’ words, best summarize the cross-sectional variation

in the word list, as compared to other categories

In the following study - primary focus is negative news

Page 7: Sentiment Analysis

BASE PAPER - OVERVIEW

Does ‘language’ predict firms’ ‘accounting earnings’ and

‘stock returns’

Major findings:

1) Negative words in firm-specific news stories forecasts low

earnings

2) Stock prices briefly underreact to the information embedded

in negative words, but incorporate fully with a slight delay

3) Negative words in stories that focus on fundamentals – have

highest predictability power (on earning and return)

Findings suggest: Investors quickly incorporate information

on firms’ fundamentals available in linguistic media, into

stock prices

Page 8: Sentiment Analysis

PRINCIPAL IDEA EXPLORED IN THE

PAPER

Principal idea explored:

Can a simple quantitative measure of language be used to

predict individual firm’s earnings and stock returns

If yes, then how to quantify the language used in financial new

stories

Unit of Measure (defined in the paper):

Raw metric: 𝑁 =𝑛𝑜.𝑜𝑓 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝑊𝑜𝑟𝑑𝑠

𝑛𝑜. 𝑜𝑓 𝑇𝑜𝑡𝑎𝑙 𝑊𝑜𝑟𝑑𝑠

Standardized metric: 𝑛𝑒𝑔 =𝑁 − 𝜇𝑁

𝜎𝑁

𝐌𝐞𝐭𝐫𝐢𝐜 𝐮𝐬𝐞𝐝 𝐚𝐬 𝐢𝐧𝐝𝐞𝐩𝐞𝐧𝐝𝐞𝐧𝐭 𝐯𝐚𝐫𝐢𝐚𝐛𝐥𝐞: 𝒏𝒆𝒈(−𝟑𝟎,−𝟑)

(i.e. treat all news stories in the [-30,-3] trading day period, prior

to an earnings announcement, as one composite story)

Page 9: Sentiment Analysis

CONTENT ANALYSIS METHODOLOGY

Research area:

Qualitative analysis; Natural language processing

Content analysis: Two-step process

Word Category Freq. Value

Alleged Negativ 1/29 1

Abuse Negativ 1/29 1

Worse Negativ 1/29 1

Happy Pstv 0/29 1

Neutral Passive 0/29 1

Step 1:

Mapping

Step 2:

Summarizing

Number of negative words

> 99% of all news articles

Example:

Page 10: Sentiment Analysis

DATASET

MEASURING NEGATIVITY

Harvard-IV-4 psychological dictionary to categorize

positive and negative words

Around 12,000 words (rows) and 180 categories

(columns)

Measure negativity by negative word frequency

Standardized fraction of negative words per story

Combine all stories per firm for each trading day to

measure frequency

Source: General Inquirer Website

Page 11: Sentiment Analysis

DATASET

FIRMS AND STORIES

1980 to 2004

S&P 500 firms

Represent ¾ of U.S. market capitalization

DJNS and WSJ stories

350,000 stories

100,000,000 words

Stories for 95.8% of S&P 500 firms

Center for Research on Security Prices for stock price data

Institutional Brokers’ Estimates System for analyst forecast

data

Compustat for accounting data

Factiva database for news stories

Page 12: Sentiment Analysis

MICROSOFT’S CASE STUDY (1/2)

Second sentence: “The alleged ‘pricing abuse will only get

worse if Microsoft is not disciplined sternly by the antitrust

court,’ said Mark Cooper, director of research for Consumer

Federal of America.”

Hypothesis: fraction of negative words relates to effect of

news on market value

Source: Factiva

1999 DJNS article headline:

Page 13: Sentiment Analysis

MICROSOFT’S CASE STUDY (2/2)

Source: Google Finance

Fraction of negative words is in 99th percentile of negative

sentences

Microsoft had irregularly low stock returns around news

story

Cumulative abnormal stock return of -42, -141 and -194

bps for the 3 trading days surrounding the news event

Page 14: Sentiment Analysis

TESTING FOR PREDICTABILITY POWER

In order to impact stock returns, at least one relationship

must hold:

1) Negative words predict Earnings (proxy for cash flows)

2) Negative words predict Discount Rates (proxy by returns)

OLS regression tests performed

using different dependent variables and control variables

Page 15: Sentiment Analysis

TEST 1 - EARNINGS PREDICTABILITY

Dependent Variable: Two measures of quarterly earnings

used

Standardized Unexpected Earnings (SUE)(1)

- Raw metric: 𝑈𝐸𝑡 = 𝐸𝑡 − 𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 𝐸𝑡

- Standardized metric: 𝑆𝑈𝐸𝑡 =𝑈𝐸𝑡−µ𝑈𝐸𝑡

𝜎𝑈𝐸𝑡

Standardized Analysts Forecast Errors (SAFE)

- SAFE =Median analyst forecast error

σUEt

Control Variables:

lagged earnings, neg-30,-3, size, B/M, trading volume, recent

stock returns (3 measures), analyst forecast revision(2) etc.

Winsorize SUE & all analysts forecast variables at 1% level

Similar results from both ‘SUE’ and ‘SAFE’

Note: (1) Based on Bernard and Thomas (1989), who use a seasonal random walk with trend model for each firm’s earnings. (2) Using Chan et. al. (1996) methodology.

Page 16: Sentiment Analysis

TEST 1 - EARNINGS PREDICTABILITY

MAIN RESULT

All 6 estimates are significant at 99% level

Several control variables also exhibit strong explanatory power, as expected

Predictability is robust even when using ‘before forecasts’ news stories

Note: Table presented above has been truncated, and does not include all control variables. Please refer to the appendix for complete details.

Page 17: Sentiment Analysis

TEST 2 - RETURN PREDICTABILITY

Following two ideas are tested

Return predictability in daily returns

Is there a trading strategy possible off this under-reaction?

Considerations:

Data at daily frequency

Dependent variable: return based on closing price (t=0 & t=1)

Cut-off time: DJNS (up to 3:30pm), WSJ (same day)

Control Variables - Earnings, size, B/M, trading volume, recent

stock returns (5 measures)

Page 18: Sentiment Analysis

TEST 2 - RETURN PREDICTABILITY

DAILY RETURNS, MAIN RESULT

neg robustly predicts slightly lower returns on the following trading day

neg coeff. is significant in 4 cases (where DJNS data in included)

Coeff. for DJNS source is higher, as compared to WSJ

Low R2, as expected in efficient markets (< 0.0026)

Note: Table presented above has been truncated, and does not include all control variables. Please refer to the appendix for complete details.

Page 19: Sentiment Analysis

TEST 2 - RETURN PREDICTABILITY

TRADING STRATEGY

Two equal weighted portfolios – constructed by ranking firms on the

basis of positive/ negative news

Long-short strategy, with daily rebalancing

Cumulative raw returns would be 21.1% per year (no trading costs)

Strategy will not be profitable if trading cost is considered

Note: Standard errors calculated using White (1980) heteroskedasticity-consistent covariance matrix approach.

Page 20: Sentiment Analysis

IS THERE A SUBSET OF NEWS

WITH BETTER PREDICTABILITY

Hypothesis:

Negative words in news stories containing word-stem ‘earn’

have better predictability

Results Expected:

Better earnings predictability

Stronger contemporaneous relationship with returns

Magnitude of under-reaction should be greater

Page 21: Sentiment Analysis

ADDING NEW INDEPENDENT VARIABLES

Regression (similar to previous case):

Add 2 new independent variables to capture specific effects

‘Fund-30,-3’: words in stories containing word-stem ‘earn’, divided by total words across all stories

Interaction term: neg-30,-3 * Fund-30,-3Not

News Stories

Not “About” Firm

Fundamentals

“About” Firm

Fundamentals

neg-30,-3

Fund-30,-3

Page 22: Sentiment Analysis

REPEATING REGRESSIONS WITH

TWO ADDITIONAL VARIABLES

Coeff. of both new terms is strongly negative and significant

Interaction: negative words in earnings-related stories are much better predictors

Note: Tables presented above has been truncated, and does not include all control variables. Please refer to the appendix for complete details.

Strong contemporaneous relationship exists

5x larger response from negative words in earnings-related stories

Page 23: Sentiment Analysis

CONCLUSION

More than ‘negative’ words

Contain valuable information

Forecast low earnings

Return Predictability

Slight delay in reaction to negative news

Predictability in t+1 day return

‘Simple’ trading strategy does not exist

Words from specific type of news carry more information

Negative words from earnings related stories are better

predictors

Page 24: Sentiment Analysis

POWER OF SOCIAL MEDIA:

THE HASH CRASH INCIDENT

Demonstrated social media’s potential to move markets

Dow Jones fell more than 150 points, the price of crude oil

plummeted, and US bond yields dropped, briefly wiping $121

billion off the value of companies in the S&P 500 index, before

recovering minutes later

Page 25: Sentiment Analysis

1. ‘SNTMNT’- OVERVIEW

Launched an API to monitor Twitter-based stock sentiment

World’s first API that makes predictions based about future stock

price movement for all stocks in the S&P 500

Accuracy as high as 60%, averaging at 54% (company estimates)

Should traders rely on Twitter sentiment alone for their trades?

Signal-to-noise ratio on social media channels is too low to provide

standalone trading signals, but definitely high enough to provide

an innovative trading indicator

Based on work of Professor Johan Bollen, an academic who found

correlations between the stock market and activity on Twitter

Page 26: Sentiment Analysis

1. ‘SNTMNT’- METHODOLOGY

Page 27: Sentiment Analysis

2. SOCIAL MARKET ANALYTICS -

METHODOLOGY

Social Market Analytics produces a family of metrics, called

S-Factors – designed to capture the signature of financial market sentiment

SMA applies these metrics to data captured from social media

sources to estimate sentiment for indices, sectors, and individual

securities

Page 28: Sentiment Analysis

SOME USEFUL ONLINE RESOURCES

IBM’s Watson and the Jeopardy! Challenge: https://www.youtube.com/watch?v=P18EdAKuC1U

Free online course on ‘Natural Language Processing’, offered on Coursera by Stanford University: https://www.coursera.org/course/nlp

General Inquirer website: http://www.wjh.harvard.edu/~inquirer/

IBM Publications: http://researcher.watson.ibm.com/researcher/view_group.php?id=147

List of words in spreadsheet format: http://www.wjh.harvard.edu/~inquirer/spreadsheet_guide.htm

Princeton’s WordNet: http://wordnet.princeton.edu/

Page 29: Sentiment Analysis

REFERENCES

Tetlock, Paul C., Maytal Saar-Tsechansky, and Sofus Mackassy.

2008. “More than words: Quantifying language to

measure firms’ fundamentals.” The Journal of Finance 63

Issue 3 p. 1437-1467.

“Welcome to the General Inquirer Home Page.” Web. 6 April

2015. < http://www.wjh.harvard.edu/~inquirer/

spreadsheet_guide.htm>.

Page 30: Sentiment Analysis

QUESTIONS AND DISCUSSION

Can you think of other examples in which any news had an

impact on returns?

Do you think big firms are using such analysis for alpha

generation?

Page 31: Sentiment Analysis

APPENDIX

TEST 1 - EARNINGS PREDICTABILITY

Page 32: Sentiment Analysis

APPENDIX

TEST 2 - RETURN PREDICTABILITY

Page 33: Sentiment Analysis

APPENDIX

SUBSET - EARNINGS PREDICTABILITY

Page 34: Sentiment Analysis

APPENDIX

SUBSET - RETURNS PREDICTABILITY