Upload
soumensahil
View
221
Download
0
Embed Size (px)
Citation preview
8/17/2019 Behaviorial Finance.pdf
1/30
Can internet search queries help to predict
stock market volatility?∗
Thomas Dimpfl and Stephan Jank∗∗
May 14, 2012
Abstract
This paper studies the dynamics of stock market volatility and retail investors’attention to the stock market, where attention to the stock market is measured byinternet search queries related to the leading stock market index. We find a strongco-movement of the Dow Jones’ realized volatility and the volume of search queriesfor its name. Furthermore, search queries Granger cause volatility: a heightenednumber of searches today is followed by an increase in volatility tomorrow. We utilizethis finding to improve several models of realized volatility. Including search queriesin autoregressive models of realized volatility helps to improve volatility forecastsin-sample and out-of-sample as well as for different forecasting horizons. Searchqueries are particularly useful in high-volatility phases when a precise prediction isvital.
Key words: realized volatility, forecasting, investor behavior,limited attention, noise trader, search engine data
JEL: G10, G14, G17
∗We thank Google for making their search volume data publicly available through Google Trends .
We would like to thank Jeremy Ginsberg, Joachim Grammig, Isaac Hacamo, Terrance Odean, FranziskaPeter, Tomas Reyes, Maik Schmeling and Hal Varian for helpful comments and suggestions. We retainresponsibility for all remaining errors. Financial support of the German Research Foundation (DeutscheForschungsgemeinschaft, DFG) is gratefully acknowledged.
∗∗Thomas Dimpfl: University of Tübingen, Stephan Jank: University of Tübingen and Centre forFinancial Research (CFR), Cologne. Contact: University of Tübingen, Department of Economics andSocial Sciences, Mohlstr. 36, D-72074 Tübingen, Germany. E-mail: [email protected],[email protected] .
8/17/2019 Behaviorial Finance.pdf
2/30
1 Introduction
Large stock market movements capture investors’ attention. This can be seen in Figure 1
which depicts a strong co-movement between the volatility of the Dow Jones stock market
index and internet search queries for its name. If volatility is low, searches related to theDow Jones are generally at their overall average level, but in turbulent times search
queries surge to levels which exceed their average by multiple times. For example, when
the volatility of the Dow Jones spiked at an almost record high of over 150% annualized
on October 10, 2008, the number of submitted searches for the index rose to more than
eleven times the average.
[Insert Figure 1 about here]
Internet search queries related to stock market index names can be interpreted as ameasure for retail investors’ attention to the stock market (cf. Da, Engelberg and Gao
2011). While professional investors monitor the leading index all the time, retail investors
presumably do not. Once the latter perceive an increased demand for information about
the stock index, they are likely to use the internet via a search engine like Google as a
source of information. Professional investors most probably, if not certainly, are not using
a search engine to obtain information about the leading stock market index. Thus, search
queries qualify as a good proxy for retail investors’ attention to the stock market. After
having searched for the stock market index, some individuals might be inclined to act and
trade at the same or the following day.
In this paper we study in detail the dynamics of searches for the Dow Jones and its
volatility. The key finding is that search queries today Granger cause volatility tomorrow.
We exploit this finding to improve volatility forecasts and augment various autoregressive
models of realized volatility with search query data. The forecasting precision can be
significantly enhanced when data on search queries enter the prediction equation. The
improvement is evident both for in-sample as well as for out-of-sample forecasts. The
longer the forecast horizon, the more efficiency gains are apparent. Furthermore, the data
on internet search queries help to predict volatility more accurately in periods of highvolatility, when a precise prediction is of utmost importance.
These findings contribute to our knowledge of stock market volatility and its long
memory characteristics, documented for example by Andersen and Bollerslev (1997). In
particular, the results are consistent with agent-based models of stock market volatility
(e.g. Lux and Marchesi 1999, Alfarano and Lux 2007). In the model by Lux and Marchesi
1
8/17/2019 Behaviorial Finance.pdf
3/30
(1999), noise traders are seen as a source of additional volatility in the stock market. A
fundamental shock in volatility triggers noise trading which in turn induces an increase
of volatility. Taking internet searches as a measure of retail investors’ attention, we
observe patterns that are consistent with the agent and noise trader theory of stock market
volatility: high volatility entails high retail investor attention, which is in turn followed
by high volatility. Our results are also in line with recent empirical evidence by Foucault,
Sraer and Thesmar (2011), who – drawing on a natural experiment in France – find that
retail investors’ trading activity leads to a higher level of volatility in individual stocks.
Note that this paper does not focus on an exogenously evoked change in attention, which
for example is studied by Engelberg and Parsons (2011). In our analysis a fundamental
volatility shock induces an elevated interest of retail investors in the stock market and we
study how retail investors’ attention, along with possible trading influences and amplifies
volatility in the following.Our key findings are supported by an investigation of trading volume. In the agent-
based model of Lux and Marchesi (1999) trading activity by noise traders induces volatil-
ity. While we are unable to observe trading by this group directly, we document that
the overall trading volume of the stocks comprised in the Dow Jones rises subsequent to
increases in search queries for the index, which is consistent with the noise trader model.
In a joint model for volatility, trading volume and search queries, the latter turn out to
be a predictor for volatility while trading volume does not, which is in line with findings
in the literature (see, for example, Brooks 1998).
In a forecasting context, other studies have sucessfully used Google search volume data.
For example Ginsberg et al. (2009) use search query data to predict influenza epidemics
and Choi and Varian (2009a,b) employ Google search data to forecast unemployment
rates and retail sales. In the field of finance, search query data are used to measure retail
investor attention (Bank, Larch and Peter 2011, Da et al. 2011, Jacobs and Weber 2011,
Vlastakis and Markellos 2012) and to predict earnings (Da, Engelberg and Gao 2010a,
Drake, Roulstone and Thornock 2011). Da, Engelberg and Gao (2010b) use search queries
related to household concerns to measure investor sentiment.
2
8/17/2019 Behaviorial Finance.pdf
4/30
2 Data and descriptive statistics
Our analysis focuses on the Dow Jones Industrial Average from July 1, 2006 to December
31, 2011. Intraday index prices are obtained from RC Research Price-Data.1 We construct
a time series of daily realized volatility RV t as introduced by Andersen, Bollerslev, Dieboldand Labys (2003) in the following way:
RV t =
n j=1
r2t,j, (1)
where r2t,j are squared intraday log-price changes of the index on day t during interval j
and n is the number of such intraday return intervals. We compute these price changes
over 10 minute intervals in order to circumvent the well-documented microstructure effects
(see e.g. Andersen et al. 2003, Andersen, Bollerslev and Meddahi 2011, Ghysels and Sinko
2011).
Descriptive statistics of Dow Jones realized volatility are presented in Panel A of
Table 1. As is evident from the skewness and kurtosis measures, the volatility time
series is heavily skewed and far from being normally distributed. We therefore resort
to the log of realized volatility as, amongst others, suggested by Andersen, Bollerslev,
Diebold and Ebens (2001) and Andersen et al. (2003). Even though normality of the data
is still rejected, the data are by far better behaved than before the transformation; in
particular, excess kurtosis is reduced significantly. The left graph in Figure 2 presents theautocorrelation function for realized volatility. The plot reveals the well-known pattern
of only slowly decaying autocorrelations (compare e.g. Andersen et al. 2001).
[Insert Table 1 and Figure 2 about here]
The data on Google search queries are obtained through Google Trends .2 We use daily
data on search volume from July 2006 to December 2011 for the keyword “dow”. The
volume measure is based on the number of searches which were submitted within the US.
We start our sample in the second half of 2006, since before July 2006 search volume dataat daily frequency exhibit numerous missing values. To match searches to the respective
1An earlier version of the paper also considered European stock market indices (the FTSE 100, theCAC 40 and the DAX). We find similar patterns of searches and realized volatilities for these indices,however, the present timing convention of Google Trends, which uses Pacific Standard Time for allcountries, makes the Google search query data less useful in a daily forecasting setting.
2Source: Google Trends (http://www.google.com/trends ).
3
http://www.google.com/trendshttp://www.google.com/trends
8/17/2019 Behaviorial Finance.pdf
5/30
time series of realized volatility, we only consider trading days. Google Trends measures
searches for a particular day from 12 pm to 12 pm Pacific Standard Time.
The data which are provided by Google are relative in nature. This means that Google
does not provide the effective total number of searches, but a search volume index only.
We standardize the search queries such that the average search frequency over the sample
period equals one, allowing for an easy interpretation.
An important issue when measuring investors’ attention to the Dow Jones Industrial
Average index, is that it is known under a variety of names and acronyms. In this paper
we use the search term “dow” to measure retail investors’ attention to this index, since
we find that this short name is the most widely used search term when investors are
interested in the Dow Jones index.
[Insert Table 2 about here]
Table 2 provides an overview of search terms that have the highest correlation with the
term “dow” according to Google Correlate .3 The table also compares the level of search
volume relative to the search term “dow”. Almost perfectly correlated with the search
term “dow” is “dow jones”, but its volume is considerably less, amounting to 47.7% of
the search volume of “dow”. In general, search engine users seem to prefer shorter search
terms. The search term “dow jones industrial” is also highly correlated with “dow”, but
amounts only to around 9.9% of the search volume of “dow”. The ticker “djia” is also a
popular search term and its searches are around 18.4% compared to “dow”. Other searchterms such as “dow close”, “current dow”, “stock market now” are also highly correlated
with “dow”but their search volume is even less, i.e. below one percent compared to “dow”.
Search terms with low search volume present two problems: First, the search volume
index by Google Trends is less accurate due to the statistical methods used in constructing
the search volume index. Second, for some terms search traffic is too low, resulting in
missing values of the search volume index. Because of the high correlation of these search
terms, the other search terms do not add further information to the searches we use. For
this reason and to keep matters simple, we confine our analysis to the search term “dow”.
An alternative to using the Dow Jones would be the S&P 500, which is commonly
modeled in the realized volatility literature. However, the S&P 500 is less suited for
our purposes, because we find it to be less followed by retail investors compared to the
Dow Jones. In our sample period, the search term “dow” has been submitted to Google
3Source: Google Correlate (http://www.google.com/trends/correlate/ ).
4
http://www.google.com/trends/correlate/http://www.google.com/trends/correlate/
8/17/2019 Behaviorial Finance.pdf
6/30
approximately ten times as often as the term “s&p 500”. Moreover, the acronym “S&P”
is generally used in the context of rating downgrades in our sample.
Panel A of Table 1 also holds summary statistics for the data on search queries. Just
as the realized volatility time series, the data on searches exhibit distinctive levels of
skewness and kurtosis. We therefore follow Da et al. (2011) by also taking logarithms of
the search data. This procedure reduces both skewness and excess kurtosis, however, it is
not as successful as in the case of realized volatility. The right graph in Figure 2 plots the
autocorrelations of Dow search queries. The autocorrelation function is decaying fairly
geometrically and much faster compared to autocorrelations of realized volatility depicted
in the left graph of Figure 2.
The data on trading volume of the Dow Jones are obtained from Datastream. It is
measured as the aggregate number of shares traded for the underlying stocks of the index
per day (in millions). Descriptive statistics are presented in Panel A of Table 1. We alsoresort to taking the logarithm of volume to reduce skewness and excess kurtosis.
As already apparent from Figure 1, search queries and realized volatility exhibit a
strong co-movement over time. The contemporaneous correlation of search queries for
“dow” and realized volatility of the Dow Jones in our sample is quite high as can be seen
from Panel B of Table 1. We find a correlation coefficient of 0.82 for the level time series
and 0.74 for the data in logarithms. Volume is correlated to a slightly lower extent with
search queries (0.51 in levels and 0.44 in logarithms) and realized volatility (0.57, both
for levels and logarithms).
3 The dynamics of volatility and searches
In the following we study the dynamics between realized volatility, search queries and
trading volume by estimating three vector autoregressive (VAR) models. Let xt be a
vector which contains the variables of interest, then the VAR(3) reads as follows:
xt = c +3
j=1
β jxt− j + εt , (2)
where c is a vector of constants and εt is a vector of white noise innovations. Model 1,
which is central to our analysis, investigates the dynamics between realized volatility and
search queries, i.e. xt = (log-RV t log-SQt). In Model 2, we will shed light on the
interaction between search queries and trading volume, i.e. xt = (log-SQt log-V Ot).
5
8/17/2019 Behaviorial Finance.pdf
7/30
Finally, Model 3 is a joint model of realized volatility, search queries and trading volume
such that xt = (log-RV t log-SQt log-V Ot).
Table 3 holds the results of the three VAR models. Coefficient estimates are presented
in Panel A, the results of the Granger causality Lagrange multiplier (LM) tests are shown
in Panel B.
[Insert Table 3 about here]
In Model 1 we find significant estimates of the autoregressive parameters for the real-
ized volatility for all included lags. Search queries show significant autoregressive terms
for lags 1 and 3. The estimation results also reveal that past volatility to some extent
positively influences present search queries. This effect is concentrated to the first lag
which is significant on the 5% level only while lags of order 2 and 3 are not found to
be significant. However, the Granger causality test reveals that realized volatility doesnot Granger cause search queries. Note that there is a high contemporaneous correlation
between volatility and searches indicating that searches immediately react to a volatility
shock at the same day.
The focus of our interest is how past search activity influences present volatility. We
find that search queries enter the RV-equation significantly at lag 1 while higher order
lags are insignificant. Furthermore, the Granger causality test indicates that past searches
provide significant information about future volatility: we reject the null hypothesis of no
influence of past log-SQ on current log-RV on all conventional significance levels. These
results support the noise trader model of volatility by Lux and Marchesi (1999). After an
initial shock in volatility, search intensity rises, which in turn is followed by heightened
levels of volatility.
As a further test of the model of Lux and Marchesi (1999), we analyze the interaction
of trading volume and search queries. Because trading volume of noise traders is not
directly observable, we turn to an indirect test using total trading volume. If the interest
of retail investors in the stock market manifests in trading activity, an increase in the
number of searches should be followed by rising levels of overall trading volume. We test
this hypothesis in Model 2. We find that volume has only marginal predictive power forsearch queries, in particular only the third lag is statistically significant. In contrast, search
queries of the previous day enter significantly in the volume equation which supports our
prediction that once information has been sought via Google, trading volume rises. The
Granger causality test also shows that trading volume and search queries are Granger
causal to each other.
6
8/17/2019 Behaviorial Finance.pdf
8/30
The estimated coefficients of the joint model of realized volatility, search queries and
trading volume (Model 3) are similar to the estimates of the respective equations in
Models 1 and 2. Again, search queries enter the realized volatility equation significantly
on lag 1 (higher order lags are not significant) while volume does not contribute to the
explanation of the variation in realized volatility which is in line with Brooks (1998).
Moreover, realized volatility is only a weak predictor of search queries: only the first lag
of log-SQ is significant on a 10% significance level. In this model we find that search queries
do Granger cause realized volatility while volume does not. Both realized volatility and
volume Granger cause search queries and realized volatility and search queries Granger
cause volume.
Figure 3 provides the impulse response functions corresponding to the model of in-
terest, Model 1. For the calculation of impulse response functions, we use a Cholesky
decomposition with the economically meaningful restriction of volatility being contem-poraneously exogenous, i.e. volatility can affect search queries immediately, but search
queries do not contemporaneously affect volatility. The intuition behind this ordering is
that there is first a fundamental volatility shock that in turn triggers retail investor atten-
tion and, thus, search queries. Search queries, on the other hand, would not rise without
a preceding event on the market (see also the argumentation in Lux and Marchesi 1999).
[Insert Figure 3 about here]
The top left and the bottom right graphs in Figure 3 present the response of log-RV and log-SQ, respectively, to their own shocks. As becomes evident from the slowly
decaying function, these shocks are highly persistent, which corresponds to the slowly
decaying autocorrelation functions presented in Figure 2. The top right figure holds the
impulse response of search queries to a shock in realized volatility. After a volatility
shock, attention is elevated for a considerable amount of time which is reflected in the
slowly decaying impulse response function. The bottom left figure presents the response
of volatility to a shock in search queries. Most noteworthy, the impact of retail investors’
attention on volatility is long-lasting and only dies out after 40-60 trading days.
4 Can search queries improve volatility forecasts?
The key result of the VAR estimation is that search queries help to predict future volatility
in addition to the information contained in lagged volatility and that a shock in search
7
8/17/2019 Behaviorial Finance.pdf
9/30
volume has an impact on volatility that persists for a considerable span of time. Now we
focus on the prediction of volatility, and thus investigate if and how searches can help to
improve volatility forecasts. In the following, we use different modeling approaches which
are commonly used to capture the time series properties of realized volatility and include
lagged search queries in each model. We evaluate the forecasting ability of these models
in- and out-of-sample, for multiple horizons as well as across different states of volatility.
For one-step ahead predictions an autoregressive model for realized volatility which
includes search queries as an explanatory variable would be sufficient. For multi-step
forecasts, however, we need to forecast log-SQ as well. For this reason we model the
time series properties of search queries along with realized volatility. We estimate vector
autoregressive models with different lag lengths which are specified as follows:
log-RV t = c1 +
p
j=1
β 1,jlog-RV t− j + γ 1,1log-SQt−1 + ε1,t (3a)
log-SQt = c2 + β 2,1log-RV t−1 +
q j=1
γ 2,jlog-SQt− j + ε2,t. (3b)
Building on the univariate autoregressive models of Andersen, Bollerslev, Christoffersen
and Diebold (2006) and Bollen and Inder (2002), we set the lag length for the realized
volatility equation (3a) to either p = 1 or p = 3. As the results of estimating the VAR
model in Equation (2) show no significant effect of searches on volatility for higher order
lags, we only include one lag of searches in Equation (3a). For the same reason weonly allow a one lag impact of realized volatility on search queries in Equation (3b). In
accordance to the autoregressive structure of the realized volatility equation, we set the
lag order q of the autoregressive terms to 1 if p = 1 and 3 if p = 3.
In addition to these ordinary autoregressive models we also consider Corsi’s (2009)
heterogeneous autoregressive (HAR) model. The HAR model has been found to capture
the long-memory properties of realized volatility very well and has recently been used
for example by Andersen, Bollerslev and Diebold (2007), Chen and Ghysels (2011) and
Chiriac and Voev (2011). Since we need a forecast of search queries for the multi-step
8
8/17/2019 Behaviorial Finance.pdf
10/30
volatility prediction, we extend the HAR model to a Vector-HAR model which reads as
follows:
log-RV t = c1 + β dlog-RV t−1 + β wlog-RV wt−1 + β mlog-RV
mt−1 + γ 1,1log-SQt−1 + ε1,t (4a)
log-SQt = c2 + β 2,1log-RV t−1 +3
j=1
γ 2,jlog-SQt− j + ε2,t. (4b)
where log-RV wt = 1
5
4
j=0 log-RV t− j and log-RV mt =
1
22
21
j=0 log-RV t− j . The search
queries Equation (4b) is the same as Equation (3b) since we find that the time series
properties of searches are well described by three autoregressive terms and one lag of
realized volatility.
The models described above are contrasted to pure realized volatility models, i.e. an
AR(1) or AR(3) model in case of Equation (3) and a HAR(3) model in case of Equation (4).The univariate models of realized volatility are simply the Equations (3a) and (4a) with
γ 1,1 restricted to zero.
In order to assess the forecasting performance, we consider two loss functions which
are robust to possible noise in the volatility measure (see Patton 2011). These are the
mean squared error (MSE) and the quasi-likelihood (QL) for which the corresponding loss
functions are defined as follows:
MSE: L
RV t+1,
RV t+1|t
=
RV t+1 −
RV t+1|t
2, (5)
QL: L
RV t+1, RV t+1|t = RV t+1 RV t+1|t − log RV t+1 RV t+1|t − 1, (6)
where RV t+1|t is the respective forecast of realized volatility based upon informationavailable up to and including time t. We also use the R2 of a Mincer and Zarnowitz (1969;
hereafter MZ) regression of the actual realized volatilities on their predicted values:
RV t+1 = b0 + b1 RV t+1|t + et+1. (7)
9
8/17/2019 Behaviorial Finance.pdf
11/30
Following the literature (e.g. Aı̈t-Sahalia and Mancini 2008, Andersen et al. 2003, Ghysels,
Santa-Clara and Valkanov 2006), we model log realized volatility, but evaluate the forecast
by comparing realized volatility and its prediction.4
4.1 In-sample forecast evaluation
Table 4 holds the results of the in-sample forecast evaluation of one-step ahead forecasts of
realized volatility. The models we consider are the univariate AR(1), AR(3) and HAR(3)
models and the respective augmented models including lagged search queries.
[Insert Table 4 about here]
Looking only at the univariate models, we see that the AR(3) produces more precise
forecasts than the AR(1). The higher order lag structure leads to a reduction in MSE of more than 11%. The HAR(3) is the best amongst the univariate models. The MZ R2 is
0.96 percentage points higher than for the AR(3) and even 4.38 percentage points higher
than for the AR(1) model. These findings are in line with the literature on volatility
prediction, which shows that the heterogeneous autoregressive model, including lagged
weekly and monthly volatility, captures the long memory properties of realized volatility
best (cf. Corsi 2009).
Comparing the univariate models (AR(1), AR(3), HAR(3)) to the SQ-augmented mod-
els (AR(1)+SQ, AR(3)+SQ, HAR(3)+SQ), we observe for all models an improvement in
performance. The addition of search queries leads to a reduction of MSE between almost
5% in the AR(1) case and 3.4% in case of the HAR(3) model. Again, we find that overall
the HAR model augmented with search queries shows the best fit, i.e. we find the lowest
MSE, the lowest value of the QL loss function and the highest R2 of the MZ regression
(marked in bold in Table 4).
4.2 Out-of-sample forecast evaluation
We now turn to the out-of-sample forecasts and provide 1 day, 1 week and 2 weeks volatility
forecasts. Weekly and bi-weekly volatility is measured as the aggregated volatility over 5and 10 business days, respectively. For our initial out-of-sample forecast we estimate the
models using the first two years (500 trading days) of our sample, i.e. from July 2006 to
4When reversing the log transformation, the forecasts are formally not optimal (Granger and Newbold1976). However, Lütkepohl and Xu (2010) show by means of an extensive simulation study that this näıveforecast generally performs just as well as the formally optimal forecast.
10
8/17/2019 Behaviorial Finance.pdf
12/30
June 2008. We then re-estimate the model for every subsequent day in the sample using
all past observations available, i.e. we increase the estimation window over time. The
estimation period of the very first run ends in June 2008. Thus, we are able to compare
the forecasting performance of volatility models during the almost record-high volatility of
October 2008. The initial two year estimation period is still long enough and has enough
variation in both volatility and search activity (cf. Figure 1) for a reliable estimation of
model parameters.
[Insert Table 5 about here]
Results of the out-of-sample prediction are summarized in Table 5. For the univariate
models our results are consistent with the findings of Corsi (2009). The HAR(3) model is
better at predicting realized volatility compared to the AR(3) or AR(1) model in terms of
MSE, MZ R2 and QL statistic. The advantage of the HAR modeling emerges particularly
when predicting volatility at longer horizons of one or two weeks. For the latter the
reduction of the MSE amounts to a considerable 25% as compared to the AR(3) model.
Turning to the multivariate models, we find that those which include searches as an
explanatory variable always outperform the univariate, pure realized volatility models.
This means that these models have a lower MSE, a lower value of the QL loss function
and a higher R2 in the Mincer-Zarnowitz regression. Adding searches is most beneficial for
longer-horizon forecasts: the MZ R2 is by 3.1 percentage points higher in the multivariate
VHAR(3) than in the univariate HAR(3). In case of the AR-models, this difference caneven be larger.
Overall, the best performing univariate model for realized volatility is the HAR model.
Augmenting the HAR model with search query data further improves the forecasting
performance, in particular at longer horizons. The intuition behind is the following:
the VHAR model benefits from modeling the dynamics of retail investors’ searches and
volatility. The VHAR gains from the fact that a shock in searches has a significant
impact on volatility that is persistent as can be seen from the impulse-response function
of Figure 3. Thus, searches can improve long-run predictions. Furthermore, search queries
are well described by the autoregressive time series model allowing for good predictions
of searches when the system is iterated forward.
11
8/17/2019 Behaviorial Finance.pdf
13/30
4.3 Forecast evaluation across different times of volatility
Precise volatility predictions are of particular importance during times of financial turmoil.
Thus, it is of great interest how the models perform in phases of high volatility compared to
calmer periods. In order to shed light on this question, we sort days according to realizedvolatility and calculate MSE and the QL loss function for several quantiles separately.
Our analysis focuses on the two best performing models, the univariate HAR model and
the VHAR model including search queries.
[Insert Table 6 about here]
Table 6 presents the results of this sorting. It shows the MSE and QL loss function for
the days with the lowest volatility in the sample (Bottom 1%, 5% to 50%) as well as the
days with the highest volatility in the sample (Top 50%, 25% to 1%). Panel A displaysthe in-sample results, Panel B the out-of-sample results.
Not surprisingly, volatility prediction in calm times is very accurate. The in-sample
MSE for the 50% least volatile days is merely 0.027 for the univariate HAR model. The
MSE for the HAR model augmented with search queries is virtually the same with 0.026.
In general, the prediction accuracy of the two models does not differ much throughout the
bottom quintiles. Thus, the autoregressive model provides a good fit of volatility in calm
times and search queries do not add information about future volatility in these times.
Turning to the most volatile days, we see that prediction accuracy weakens for both
models but at the same time we observe that the model including searches increasingly
outperforms the univariate realized volatility model. The HAR model in-sample MSE for
the 50% most volatile days amounts to 0.267 and increases to 6.875 for the 1% most volatile
days. In high volatility times it becomes increasingly more difficult for the autoregressive
model to predict volatility. Also the MSE for the bivariate HAR model of realized volatility
and search queries increases in more volatile times, but to a lower extent. The MSE for
the model including searches is 6.396 compared to 6.875 for the univariate model, which
is a reduction of 7.0%. A similar picture can be seen for the QL loss function. For
the 1% most volatile days the QL loss function of the SQ-augmented model yields 33 .16compared to 36.36 for the univariate model, which is a reduction of 8.8%. The difference
in the bottom quintiles between the QL and MSE originates from the different weights
the loss functions puts on the prediction errors.
A similar pattern is found for the out-of-sample prediction. Results are presented in
Panel B of Table 6. Again, for the bottom quantiles the difference in MSE is negligible
12
8/17/2019 Behaviorial Finance.pdf
14/30
while the QL loss function slightly favors the pure volatility model over the model which
includes search queries. In phases of high volatility, however, the model with searches
clearly outperforms the HAR(3). The performance increase is even more pronounced
than in the in-sample case: in the 1% highest volatility days, the prediction MSE can be
reduced by 13% if search queries are included. Also, the value of the QL is significantly
reduced by over 12%.
In summary, the cascading structure of the heterogeneous autoregressive model seems
to capture the long-memory properties of realized volatility very well. However, in a crisis
period (when volatility is high) retail investors’ attention is an important component and
predictor of volatility. If we interpret the HAR model as a model of agents with different
time horizons (namely daily, weekly and monthly) as suggested by Corsi (2009), we can
understand retail investors as a fourth investor group that adds to volatility in very tur-
bulent times.
4.4 Robustness of out-of-sample forecast
One possible concern about the practical usefulness of search data in predicting volatility
is the time of publication of the data. Up to now, Google Trends publishes search volume
daily with a lag of only one day. Even though lagged publication may reduce the fore-
casting power of search queries for volatility, the analysis of the forecasting relationship
presented so far helps us to gain a better understanding of the dynamics of retail investors’attention to the stock market and their role in stock market volatility. Furthermore, note
that technically it would be possible to publish search volume even faster. At present,
Google publishes the search volume for the fastest rising searches in the US every hour
through Google Hot Trends with only a few hours delay.5
Nevertheless, to account for the delay in the publication of search query data we also
look at an out-of-sample forecast based on search query data of two lags, i.e. volatility
tomorrow is predicted using volatility data until today and search data until yesterday.
Table A.1 in the Appendix presents the results of this robustness check. The general
results of Table 5 are confirmed: models with search query data generally outperform the
univariate realized volatility models, but (as expected) the forecasting power of search
queries is reduced. One exception is the VAR(3) model, where the MZ R2 is slightly lower
than that of the AR(3) model. However, the MSE and QL loss function again favor the
5Source: Google Hot Trends (http://www.google.com/trends/hottrends )
13
http://www.google.com/trends/hottrendshttp://www.google.com/trends/hottrends
8/17/2019 Behaviorial Finance.pdf
15/30
model with search queries. Overall, the HAR model including search queries remains the
best performing model.
5 Concluding remarksInternet search data can describe the interest of individuals as recently advocated by Choi
and Varian (2009a) or Da et al. (2011). In this paper we use daily search query data to
measure the individuals’ interest in the aggregate stock market. We find that investors’
attention to the stock market rises in times of strong market movements. Moreover, a rise
in investors’ attention is followed by higher volatility. These findings are consistent with
agent-based models of volatility (Lux and Marchesi 1999, Alfarano and Lux 2007).
Exploiting the fact that search queries Granger cause volatility, we incorporate searches
in several prediction models for realized volatility. Augmenting these models with searchqueries leads to more precise in- and out-of-sample forecasts, in particular in the long
run and in high volatility phases. Thus, search queries constitute a valuable source of
information for predicting future volatility which can essentially be used in real time.
14
8/17/2019 Behaviorial Finance.pdf
16/30
References
Äıt-Sahalia, Y. and Mancini, L.: 2008, Out of sample forecasts of quadratic variation,
Journal of Econometrics 147(1), 17–33.
Alfarano, S. and Lux, T.: 2007, A noise trader model as a generator of apparent financial
power laws and long memory, Macroeconomic Dynamics 11(Supplement S1), 80–101.
Andersen, T. G. and Bollerslev, T.: 1997, Heterogeneous Information Arrivals and Re-
turn Volatility Dynamics: Uncovering the Long-Run in High Frequency Returns, The
Journal of Finance 52(3), 975–1005.
Andersen, T. G., Bollerslev, T., Christoffersen, P. F. and Diebold, F. X.: 2006, Prac-
tical Volatility and Correlation Modeling for Financial Market Risk Management, in
M. Carey and R. M. Stulz (eds), The Risks of Financial Institutions , University of
Chicago Press, Chicago, Illinois, chapter 17, pp. 513–548.
Andersen, T. G., Bollerslev, T. and Diebold, F. X.: 2007, Roughing It Up: Including Jump
Components in the Measurement, Modeling, and Forecasting of Return Volatility, The
Review of Economics and Statistics 89(4), 701–720.
Andersen, T. G., Bollerslev, T., Diebold, F. X. and Ebens, H.: 2001, The distribution of
realized stock return volatility, Journal of Financial Economics 61(1), 43–76.
Andersen, T. G., Bollerslev, T., Diebold, F. X. and Labys, P.: 2003, Modeling and Fore-
casting Realized Volatility, Econometrica 71(2), 529–626.
Andersen, T. G., Bollerslev, T. and Meddahi, N.: 2011, Realized Volatility Forecasting
and Market Microstructure Noise, Journal of Econometrics 160(1), 220–234.
Bank, M., Larch, M. and Peter, G.: 2011, Google search volume and its influence on
liquidity and returns of German stocks, Financial Markets and Portfolio Management
25(3), 239–264.
Bollen, B. and Inder, B.: 2002, Estimating daily volatility in financial markets utilizing
intraday data, Journal of Empirical Finance 9(5), 551–562.
Brooks, C.: 1998, Predicting stock index volatility: can market volume help?, Journal of
Forecasting 17(1), 59–80.
15
8/17/2019 Behaviorial Finance.pdf
17/30
Chen, X. and Ghysels, E.: 2011, News - Good or Bad - and its Impact on Volatility
Predictions over Multiple Horizons, Review of Financial Studies 24(1), 46–81.
Chiriac, R. and Voev, V.: 2011, Modelling and forecasting multivariate realized volatility,
Journal of Applied Econometrics 26(6), 922–947.
Choi, H. and Varian, H.: 2009a, Predicting initial claims for unemployment benefits,
Working Paper .
Choi, H. and Varian, H.: 2009b, Predicting the present with Google trends, Working
Paper pp. 1–23.
Corsi, F.: 2009, A Simple Approximate Long-Memory Model of Realized Volatility, Jour-
nal of Financial Econometrics 7(2), 174–196.
Da, Z., Engelberg, J. and Gao, P.: 2010a, In search of earnings predictability, Working
Paper .
Da, Z., Engelberg, J. and Gao, P.: 2010b, The Sum of All FEARS: Investor Sentiment
and Asset Prices, Working Paper .
Da, Z., Engelberg, J. and Gao, P.: 2011, In Search of Attention, The Journal of Finance
66(5), 1461–1499.
Drake, M., Roulstone, D. and Thornock, J.: 2011, Investor Information Demand: Evi-dence from Google Searches around Earnings Announcements, Working Paper .
Engelberg, J. and Parsons, C.: 2011, The Causal Impact of Media in Financial Markets,
The Journal of Finance 66(1), 67–97.
Foucault, T., Sraer, D. and Thesmar, D. J.: 2011, Individual Investors and Volatility, The
Journal of Finance 66(4), 1369–1406.
Ghysels, E., Santa-Clara, P. and Valkanov, R.: 2006, Predicting Volatility: Getting the
Most out of Return Data Sampled at Different Frequencies, Journal of Econometrics 131(1-2), 59–95.
Ghysels, E. and Sinko, A.: 2011, Volatility Forecasting and Microstructure Noise, Journal
of Econometrics 160(1), 257–271.
16
8/17/2019 Behaviorial Finance.pdf
18/30
Ginsberg, J., Mohebbi, M. H., Patel, R. S., Brammer, L., Smolinski, M. S. and Bril-
liant, L.: 2009, Detecting influenza epidemics using search engine query data, Nature
457, 1012–1014.
Granger, C. W. J.: 1969, Investigating Causal Relations by Econometric Models andCross-spectral Methods, Econometrica 37(3), 424–438.
Granger, C. W. J. and Newbold, P.: 1976, Forecasting Transformed Series, Journal of the
Royal Statistical Society. Series B (Methodological) 38(2), 189–203.
Jacobs, H. and Weber, M.: 2011, The Trading Volume Impact of Local Bias: Evidence
from a Natural Experiment, Review of Finance (forthcoming).
Lütkepohl, H. and Xu, F.: 2010, The role of the log transformation in forecasting economic
variables, Empirical Economics pp. 1–20.
Lux, T. and Marchesi, M.: 1999, Scaling and criticality in a stochastic multi-agent model
of a financial market, Nature 397, 498–500.
Mincer, J. A. and Zarnowitz, V.: 1969, The Evaluation of Economic Forecasts, in J. A.
Mincer (ed.), Economic Forecasts and Expectations: Analysis of Forecasting Behavior
and Performance , Studies in Business Cycles, NBER.
Patton, A. J.: 2011, Volatility forecast comparison using imperfect volatility proxies,
Journal of Econometrics 160(1), 246–256.
Vlastakis, N. and Markellos, R. N.: 2012, Information demand and stock market volatility,
Journal of Banking and Finance 36(6), 1808 – 1821.
17
8/17/2019 Behaviorial Finance.pdf
19/30
Tables and Figures
0
. 0 2
. 0 4
. 0 6
. 0 8
. 1
R e a l i z e d v o l a t i l i t y
2007 2008 2009 2010 2011 2012
Realized volatility of the Dow Jones
0
5
1 0
S e a r c h V o l u m e I n d e x
2007 2008 2009 2010 2011 2012
Search queries for the index name
Figure 1:
Realized volatility and search activityThis figure displays daily realized volatility of the Dow Jones (upper graphic) and the volumeof search queries (lower graphic) for its name from July 1, 2006 to December 31, 2011. Searchqueries are standardized such that the sample average equals one.
18
8/17/2019 Behaviorial Finance.pdf
20/30
− 0 . 2
0
0 . 0
0
0 . 2
0
0 . 4
0
0 . 6
0
0 . 8
0
1 . 0
0
A u t o c o r r e l a t i o
n s
0 20 40 60 80Lag
Realized volatility
− 0 . 2
0
0 . 0
0
0 . 2
0
0 . 4
0
0 . 6
0
0 . 8
0
1 . 0
0
A u t o c o r r e l a t i o
n s
0 20 40 60 80Lag
Search queries
Figure 2:Autocorrelations of realized volatility and search queriesThis figure displays the autocorrelations of the Dow Jones’ realized volatility (left graph)
and search queries for its name (right graph) in the sample period. Shaded areas indicate95% confidence bounds.
19
8/17/2019 Behaviorial Finance.pdf
21/30
0
. 1
. 2
. 3
0 20 40 60 80 100Days
Response of volatility to a shock in volatility
0
. 0 2
. 0 4
. 0 6
. 0 8
0 20 40 60 80 100Days
Response of searches to a shock in volatility
0
. 0 2
. 0 4
. 0 6
0 20 40 60 80 100Days
Response of volatility to a shock in searches
0
. 0 5
. 1
. 1 5
0 20 40 60 80 100Days
Response of searches to a shock in searches
Figure 3:Impulse response functionsThe table displays the impulse response functions of the VAR(3) of realized volatility andsearch queries (Table 3, Model 1). Shaded areas indicate 95% confidence bounds.
20
8/17/2019 Behaviorial Finance.pdf
22/30
Table 1:Summary statistics and correlationsThis table provides descriptive statistics of realized volatility (RV), search queries (SQ) andtrading volume (VO) of the Dow Jones between July 1, 2006 and December 31, 2011. Theupper panel holds summary statistics while the lower panel presents sample correlations.
Panel A: Summary statistics
Mean S.D. Skewness Kurtosis Min. Max.
RV 0.009 0.007 3.90 30.75 0.002 0.096
log-RV -4.87 0.56 0.57 3.57 -6.38 -2.34
SQ 1.00 0.72 5.21 48.21 0.29 11.27
log-SQ -0.13 0.46 1.25 5.70 -1.23 2.42
VO 234.0 86.8 1.48 6.10 52.6 672.9
log-VO 5.39 0.34 0.16 3.82 3.96 6.51
Panel B: Correlations
(1) (2) (3) (4) (5) (6)
RV (1) 1
log-RV (2) 0.89 1
SQ (3) 0.82 0.67 1
log-SQ (4) 0.76 0.74 0.89 1
VO (5) 0.57 0.58 0.51 0.49 1
log-VO (6) 0.53 0.57 0.44 0.44 0.96 1
21
8/17/2019 Behaviorial Finance.pdf
23/30
Table 2:Correlation of “dow” and other search termsThis table shows the top ten search terms that are correlated with the search term “dow”during the sample period July 2006 to December 2011 at weekly frequency. The second
column shows the correlation coefficient, which was obtained from Google Correlate . Thethird column shows the search volume scaled relative to the average search volume of “dow”,which is standardized to 100. For example, 46.67 in row 1 means that the search volume of “dow jones” relative to the search volume of “dow” is 46.67%. The search volume scale wasobtained from Google Trends and re-scaled for the sample size.
Rank Correlation Rel. Volume (%) Search term1 0.9981 46.67 dow jones2 0.9707 18.40 djia3 0.9672 6.13 dow stock4 0.9612 0.33 dow close5 0.9612 0.31 current dow
6 0.9591 9.91 dow jones industrial7 0.9565 0.16 current dow jones8 0.9545 0.39 google dow9 0.9481 0.04 stock market now
10 0.9476 9.61 industrial average
22
8/17/2019 Behaviorial Finance.pdf
24/30
T a b l e 3 :
V A R
m o d e l e s t i m a t i o n r e s u l t s
T h i s t a b l e d i s p l a y s t h e e s t i m a t i o n r e s u l t s o f a V e c t o r A u t o r e g r e s s i v e m o d e l , V A R ( 3 ) ,
f o r l o g r e a l i z e d v o l a t i l i t y ( l o g - R V ) , l o g s e a r c h
q u e r i e s ( l o g - S Q ) a n d l o g t r a d i n g v o l u m e ( l o g - V O ) f o r t h e D o w J o n e s i n d e x . P a n e l A p r o v i d e s c o e ffi c i e n t e s t i m a t e s , P a n e l B
t h e r e s u l t s o f
a G r a n g e r c a u s a l i t y L a g r a n g e M u l t i p l i e r t e s t . p - v a l u e s a r e g i v e n i n
p a r e n t h e s e s . S i g n i fi c a n c e a t t h e 5 % l e v e l i s
h i g h l i g h t e d i n
b o l d f a c e .
P a n e l A : V A R
e s t i m a t i o n
M o d e l 1
M o d e l 2
M o d e l 3
l o g - R V
t
l o g - S Q
t
l o g - S Q
t
l o g - V O
t
l o g
- R V
t
l o g - S Q
t
l o g - V O
t
l o g - R V
t − 1
0 . 4
5
0 . 0
3
0
. 4 5
0 . 0 3
0 . 0
7
( 0 . 0 0 0 )
( 0 . 0 4 3 )
( 0 . 0 0 0 )
( 0 . 0 9 0 )
( 0 . 0 0 6 )
l o g - R V
t − 2
0 . 2
1
- 0 . 0 0
0
. 2 3
0 . 0 0
- 0 . 0 2
( 0 . 0 0 0 )
( 0 . 8 5 9 )
( 0 . 0 0 0 )
( 0 . 8 8 2 )
( 0 . 4 5 7 )
l o g - R V
t − 3
0 . 1
6
- 0 . 0 1
0
. 1 5
0 . 0 1
- 0 . 0 1
( 0 . 0 0 0 )
( 0 . 5 4 4 )
( 0 . 0 0 0 )
( 0 . 6 0 9 )
( 0 . 7 5 2 )
l o g - S Q
t − 1
0 . 2
3
0 . 8
1
0 . 8
2
0 . 1
7
0
. 2 2
0 . 8
0
0 . 1
4
( 0 . 0 0 0 )
( 0 . 0 0 0 )
( 0 . 0 0 0 )
( 0 . 0 0 0 )
( 0 . 0 0 0 )
( 0 . 0 0 0 )
( 0 . 0 0 2 )
l o g - S Q
t − 2
- 0 . 1 0
- 0 . 0 5
- 0 . 0 4
- 0 . 0 9
- 0 . 0 8
- 0 . 0 5
- 0 . 0 7
( 0 . 1 5 0 )
( 0 . 1 6 8 )
( 0 . 2 2 7 )
( 0 . 1 1 1 )
( 0 . 2 4 3 )
( 0 . 2 3 0 )
( 0 . 1 8 2 )
l o g - S Q
t − 3
- 0 . 0 1
0 . 1
7
0 . 1
9
- 0 . 0 6
- 0 . 0 2
0 . 1
7
- 0 . 0 6
( 0 . 9 0 9 )
( 0 . 0 0 0 )
( 0 . 0 0 0 )
( 0 . 1 8 4 )
( 0 . 7 1 1 )
( 0 . 0 0 0 )
( 0 . 1 5 3 )
l o g - V O
t − 1
0 . 0 2
0 . 4
1
- 0 . 0 0
0 . 0 1
0 . 3
7
( 0 . 2 1 6 )
( 0 . 0 0 0 )
( 0 . 9 8 4 )
( 0 . 6 6 7 )
( 0 . 0 0 0 )
l o g - V O
t − 2
- 0 . 0 1
0 . 2
2
- 0 . 0 5
- 0 . 0 2
0 . 2
3
( 0 . 4 8 4 )
( 0 . 0 0 0 )
( 0 . 2 0 7 )
( 0 . 4 2 7 )
( 0 . 0 0 0 )
l o g - V O
t − 3
- 0 . 0
5
0 . 1
5
0
. 0 2
- 0 . 0
5
0 . 1
6
( 0 . 0 1 0 )
( 0 . 0 0 0 )
( 0 . 6 4 9 )
( 0 . 0 1 1 )
( 0 . 0 0 0 )
C o n s t a n t
- 0 . 8
6
0 . 0 9
0 . 2
1
1 . 2
0
- 0 . 6
2
0 . 5
3
1 . 5
3
( 0 . 0 0 0 )
( 0 . 1 6 4 )
( 0 . 0 1 8 )
( 0 . 0 0 0 )
( 0 . 0 1 7 )
( 0 . 0 0 0 )
( 0 . 0 0 0 )
23
8/17/2019 Behaviorial Finance.pdf
25/30
T a b l e 3 - C o n t i n u e d
P a n e
l B : G r a n g e r c a u s a l i t y t e s
t
M o d e l 1
M o d e l 2
M o d e l 3
E x c l u d e d :
l o g - R V
t
l o g - S Q
t
l o g - S Q
t
l o g - V O
t
l o g - R V
t
l o g - S Q
t
l o g - V O
t
l o g - R V
5 . 2 0
8 . 7
8
9 . 1
2
( 0 . 1 5 8 )
( 0 . 0 3 2 )
( 0 . 0 2 8 )
l o g - S Q
3 1 . 6
1
1 9 . 7
8
3 0 . 6
5
1 0 . 8
0
( 0 . 0 0 0 )
( 0 . 0 0 0 )
( 0
. 0 0 0 )
( 0 . 0 1 3 )
l o g - V O
1 1 . 4
1
2 . 2 1
1 5 . 0
1
( 0 . 0 1 0 )
( 0
. 5 3 1 )
( 0 . 0 0 2 )
24
8/17/2019 Behaviorial Finance.pdf
26/30
Table 4:In-sample forecast evaluationThe table compares the in-sample forecasts of the models described in the first column.AR(1), AR(3) and HAR(3) are univariate models of realized volatility only, AR(1)+SQ,AR(3)+SQ and HAR(3)+SQ are the models augmented with lagged search queries. Perfor-
mance measures are the mean squared error (MSE,×
10
4
), the quasi-likelihood loss function(QL, ×102) and the R2 (in percent) of the Mincer-Zarnowitz regression. The preferredmodel (minimum of QL loss function and MSE, maximum of R2) is indicated through boldnumbers.
Model: MSE QL R2
AR(1) 0.172 22.819 65.96
AR(1) + SQ 0.164 22.408 66.64
AR(3) 0.153 21.273 69.38
AR(3) + SQ 0.148 21.109 69.97HAR(3) 0.148 21.155 70.34
HAR(3) + SQ 0.143 21.071 71.11
25
8/17/2019 Behaviorial Finance.pdf
27/30
T a b l e 5 :
O u t - o f - s a m
p l e f o r e c a s t e v a l u a t i o n
T h e t a b l e c o m
p a r e s t h e 1 d a y , 1 w e e k a n d 2 w e e
k s o u t - o f - s a m p l e f o r e c a s t s o f t h e m
o d e l s d e s c r i b e d i n t h e fi r s t c o l u m n
. A R ( 1 ) ,
A R ( 3 ) a n d H A
R ( 3 ) a r e u n i v a r i a t e m o d e l s o f r e a l i z e d v o l a t i l i t y o n l y , V A R ( 1 ) , V A R ( 3 ) a n d V H A R ( 3 ) a r e b i v a r i a t e m o d e l s o f r e a l i z e d
v o l a t i l i t y ( R V ) a n d s e a r c h q u e r i e s ( S Q ) . P e r f o r m a
n c e m e a s u r e s a r e t h e m e a n s q u a r e d e r r o r ( M S E , × 1 0 4 ) , t h e q u a s i - l i k e l i h o o d l o s s
f u n c t i o n ( Q L ,
× 1 0 2 ) a n d t h e R 2
( i n p e r c e n t ) o f
t h e M i n c e r - Z a r n o w i t z r e g r e s s i o n .
T h e p r e f e r r e d m o d e l ( m i n i m u m o
f Q L l o s s
f u n c t i o n a n d M
S E , m a x i m u m o f R 2 ) i s i n d i c a t e d
t h r o u g h b o l d n u m b e r s .
1 d a y
1 w e e k
2 w e e k s
M o d e l :
M S E
Q L
R 2
M S E
Q L
R 2
M S E
Q L
R
2
A R ( 1 )
R V
0 . 2 4 0
5 . 4 1 3
6 4 . 1 4
6 . 7 3 1
6 . 1 4 0
6
1 . 5 0
3 4 . 3 0 6
9 . 1 3 0
5 0
. 0 8
V A R ( 1 )
R V , S Q
0 . 2 2 4
4 . 8 0 7
6 4 . 7 1
4 . 8 9 6
4 . 6 9 8
6
3 . 7 2
2 4 . 0 6 3
6 . 4 9 8
5 5
. 0 4
A R ( 3 )
R V
0 . 2 1 0
4 . 5 2 7
6 7 . 8 0
4 . 3 6 7
3 . 9 4 9
6
9 . 5 6
2 1 . 1 2 9
5 . 3 3 9
6 2
. 7 5
V A R ( 3 )
R V , S Q
0 . 2 0 2
4 . 2 8 0
6 8 . 1 2
3 . 9 1 6
3 . 4 7 2
6
9 . 7 7
1 7 . 4 3 6
4 . 4 9 5
6 3
. 9 6
H A R ( 3 )
R V
0 . 1 9 8
4 . 3 2 4
6 9 . 0 7
3 . 6 5 5
3 . 4 5 0
7
2 . 0 7
1 5 . 6 9 9
4 . 1 9 8
6 7
. 7 4
V H A R (
3 )
R V , S Q
0 . 1
9 4
4 . 1
4 2
6 9 . 7
4
3 . 5
4 8
3 . 1
7 9 7
3 . 6
0
1 4 . 8
7 6
3 . 7
6 3
7 0
. 8 1
26
8/17/2019 Behaviorial Finance.pdf
28/30
T a b l e 6 :
F o r e c a s t e v a l u a t i o n a c r o s s d i ff e r e n t t
i m e s o f v o l a t i l i t y
T h e t a b l e c o m
p a r e s t h e f o r e c a s t i n g p e r f o r m a n c e
o f t h e u n i v a r i a t e r e a l i z e d v o l a t i l i t y H A R ( 3 ) m o d e l t o t h e H A R ( 3 ) o
f r e a l i z e d
v o l a t i l i t y ( R V ) a n d s e a r c h q u e r i e s ( S Q ) a c r o s s s t a t e s o f d i ff e r e n t v o l a t i l i t y . W e s o r t a l l d a y s i n t o t h e b o t t o m a n d t o
p 1 % 5 %
t o 5 0 % o f v o l a t i l i t y a n d c a l c u l a t e m e a n s q u a r e d
e r r o r ( M S E , × 1 0 4 ) a n d t h e q u a s i - l i k e l i h o o d l o s s f u n c t i o n ( Q L , × 1 0 2 ) f o r e a c h
p e r c e n t i l e .
P a n e l A : I n - s a m p l e p r e d i c t i o n
R e a l i z e d V o l a t
i l i t y
B o t t o m
T o p
1 %
5 %
1 0 %
2 5 %
5 0 %
5 0 %
2 5 %
1 0 %
5 %
1 %
M S E :
H A R ( 3 )
0 . 0 4 0
0 . 0 2 4
0 . 0 2 1
0 . 0 2 0
0 . 0 2 7
0 . 2 6 7
0 . 4 8 7
1 . 0 4 7
1 . 8 7 0
6 . 8 7 5
M S E :
H A R ( 3 ) +
S Q
0 . 0 3 8
0 . 0 2 5
0 . 0 2 1
0 . 0 1 9
0 . 0 2 6
0 . 2 5 8
0 . 4 6 9
0 . 9 9 5
1 . 7 4 8
6 . 3 9 6
D i ff e r e n c e i n M S E
0 . 0 0 2
- 0 . 0 0 1
0 . 0 0 0
0 . 0 0 1
0 . 0 0 1
0 . 0 1 0
0 . 0 1 8
0 . 0 5 2
0 . 1 2 2
0 . 4 7 9
Q L
H A R ( 3 )
1 5 . 2 3
7 . 8 4
5 . 7 0
4 . 0 7
3 . 5 4
5
. 5 1
7 . 5 3
1 1 . 0 6
1 3 . 7 3
3 6
. 3 6
Q L
H A R ( 3 ) +
S Q
1 5 . 0 7
8 . 0 1
5 . 7 9
4 . 0 3
3 . 4 4
5
. 4 4
7 . 4 2
1 0 . 5 2
1 2 . 4 2
3 3
. 1 6
D i ff e r e n c e i n Q L
0 . 1 6
- 0 . 1 6
- 0 . 0 9
0 . 0 4
0 . 0 9
0
. 0 7
0 . 1 1
0 . 5 4
1 . 3 1
3
. 2 0
27
8/17/2019 Behaviorial Finance.pdf
29/30
T a b l e 6 - C o n t i n u e d
P a n e l
B : O u t - o f - s a m p l e p r e d i c t i o n
R e a l i z e d V o
l a t i l i t y
B o t t o m
T o p
1 %
5
%
1 0 %
2 5 %
5 0 %
5 0 %
2 5 %
1 0 %
5 %
1 %
M S E :
H A R
( 3 )
0 . 0 1 9
0 . 0
2 4
0 . 0 2 2
0 . 0 2 3
0 . 0 2 7
0 . 1 5 1
0 . 2 5 5
0 . 5 1 3
0 . 8 6 7
2 . 7 9 9
M S E :
V H A
R ( 3 ) ( R V , S Q )
0 . 0 2 2
0 . 0
2 8
0 . 0 2 5
0 . 0 2 5
0 . 0 2 8
0 . 1 4 2
0 . 2 3 6
0 . 4 6 8
0 . 7 8 6
2 . 4 4 1
D i ff e r e n c e i n
M S E
- 0 . 0 0 3
- 0 . 0
0 3
- 0 . 0 0 3
- 0 . 0 0 2
- 0 . 0 0 1
0 . 0 0 9
0 . 0 1 9
0 . 0 4 5
0 . 0 8 2
0 . 3 5 8
Q L
H A R
( 3 )
7 . 9 9
6 .
8 5
5 . 3 6
4 . 3 0
3 . 5 8
5 . 9 3
8 . 2 0
1 2 . 3 0
1 8 . 1 3
5 2 . 5 6
Q L
V H A
R ( 3 ) ( R V , S Q )
9 . 2 1
7 .
7 1
6 . 0 0
4 . 6 3
3 . 6 6
5 . 4 9
7 . 5 9
1 1 . 3 0
1 6 . 5 0
4 6 . 0 9
D i ff e r e n c e i n
Q L
- 1 . 2 2
- 0 .
8 5
- 0 . 6 4
- 0 . 3 3
- 0 . 0 7
0 . 4 4
0 . 6 1
1 . 0 1
1 . 6 3
6 . 4 7
28
8/17/2019 Behaviorial Finance.pdf
30/30
A Appendix
Table A.1:Robustness of the out-of-sample forecasts: Accounting for the de-
layed publication of search query dataThe table repeats the 1 day out-of-sample volatility forecast presented in Table 5 usingsearch query data of two lags, i.e. volatility tomorrow is predicted with volatility data untiltoday and search data until yesterday to account for delays in the publication of search querydata. Performance measures are the mean squared error (MSE,×104), the quasi-likelihoodloss function (QL, ×102) and the R2 (in percent) of the Mincer-Zarnowitz regression. Thepreferred model (minimum of QL loss function and MSE, maximum of R2) is indicatedthrough bold numbers.
Model: MSE QL R2
AR(1) RV 0.240 5.413 64.14VAR(1) RV, SQ 0.214 4.745 66.23
AR(3) RV 0.210 4.527 67.80
VAR(3) RV, SQ 0.206 4.397 67.73
HAR(3) RV 0.198 4.324 69.07
VHAR(3) RV, SQ 0.194 4.223 69.43
29