Behaviorial Finance.pdf

Embed Size (px)

Citation preview

  • 8/17/2019 Behaviorial Finance.pdf

    1/30

    Can internet search queries help to predict

    stock market volatility?∗

    Thomas Dimpfl and Stephan Jank∗∗

    May 14, 2012

    Abstract

    This paper studies the dynamics of stock market volatility and retail investors’attention to the stock market, where attention to the stock market is measured byinternet search queries related to the leading stock market index. We find a strongco-movement of the Dow Jones’ realized volatility and the volume of search queriesfor its name. Furthermore, search queries Granger cause volatility: a heightenednumber of searches today is followed by an increase in volatility tomorrow. We utilizethis finding to improve several models of realized volatility. Including search queriesin autoregressive models of realized volatility helps to improve volatility forecastsin-sample and out-of-sample as well as for different forecasting horizons. Searchqueries are particularly useful in high-volatility phases when a precise prediction isvital.

    Key words:   realized volatility, forecasting, investor behavior,limited attention, noise trader, search engine data

    JEL:   G10, G14, G17

    ∗We thank Google for making their search volume data publicly available through   Google Trends .

    We would like to thank Jeremy Ginsberg, Joachim Grammig, Isaac Hacamo, Terrance Odean, FranziskaPeter, Tomas Reyes, Maik Schmeling and Hal Varian for helpful comments and suggestions. We retainresponsibility for all remaining errors. Financial support of the German Research Foundation (DeutscheForschungsgemeinschaft, DFG) is gratefully acknowledged.

    ∗∗Thomas Dimpfl: University of Tübingen, Stephan Jank: University of Tübingen and Centre forFinancial Research (CFR), Cologne. Contact: University of Tübingen, Department of Economics andSocial Sciences, Mohlstr.  36, D-72074 Tübingen, Germany. E-mail:   [email protected],[email protected] .

  • 8/17/2019 Behaviorial Finance.pdf

    2/30

    1 Introduction

    Large stock market movements capture investors’ attention. This can be seen in Figure 1

    which depicts a strong co-movement between the volatility of the Dow Jones stock market

    index and internet search queries for its name. If volatility is low, searches related to theDow Jones are generally at their overall average level, but in turbulent times search

    queries surge to levels which exceed their average by multiple times. For example, when

    the volatility of the Dow Jones spiked at an almost record high of over 150% annualized

    on October 10, 2008, the number of submitted searches for the index rose to more than

    eleven times the average.

    [Insert Figure 1  about here]

    Internet search queries related to stock market index names can be interpreted as ameasure for retail investors’ attention to the stock market (cf.  Da, Engelberg and Gao

    2011). While professional investors monitor the leading index all the time, retail investors

    presumably do not. Once the latter perceive an increased demand for information about

    the stock index, they are likely to use the internet via a search engine like Google as a

    source of information. Professional investors most probably, if not certainly, are not using

    a search engine to obtain information about the leading stock market index. Thus, search

    queries qualify as a good proxy for retail investors’ attention to the stock market. After

    having searched for the stock market index, some individuals might be inclined to act and

    trade at the same or the following day.

    In this paper we study in detail the dynamics of searches for the Dow Jones and its

    volatility. The key finding is that search queries today Granger cause volatility tomorrow.

    We exploit this finding to improve volatility forecasts and augment various autoregressive

    models of realized volatility with search query data. The forecasting precision can be

    significantly enhanced when data on search queries enter the prediction equation. The

    improvement is evident both for in-sample as well as for out-of-sample forecasts. The

    longer the forecast horizon, the more efficiency gains are apparent. Furthermore, the data

    on internet search queries help to predict volatility more accurately in periods of highvolatility, when a precise prediction is of utmost importance.

    These findings contribute to our knowledge of stock market volatility and its long

    memory characteristics, documented for example by  Andersen and Bollerslev (1997). In

    particular, the results are consistent with agent-based models of stock market volatility

    (e.g. Lux and Marchesi 1999, Alfarano and Lux 2007). In the model by Lux and Marchesi

    1

  • 8/17/2019 Behaviorial Finance.pdf

    3/30

    (1999), noise traders are seen as a source of additional volatility in the stock market. A

    fundamental shock in volatility triggers noise trading which in turn induces an increase

    of volatility. Taking internet searches as a measure of retail investors’ attention, we

    observe patterns that are consistent with the agent and noise trader theory of stock market

    volatility: high volatility entails high retail investor attention, which is in turn followed

    by high volatility. Our results are also in line with recent empirical evidence by Foucault,

    Sraer and Thesmar (2011), who – drawing on a natural experiment in France – find that

    retail investors’ trading activity leads to a higher level of volatility in individual stocks.

    Note that this paper does not focus on an exogenously evoked change in attention, which

    for example is studied by  Engelberg and Parsons (2011). In our analysis a fundamental

    volatility shock induces an elevated interest of retail investors in the stock market and we

    study how retail investors’ attention, along with possible trading influences and amplifies

    volatility in the following.Our key findings are supported by an investigation of trading volume. In the agent-

    based model of  Lux and Marchesi (1999) trading activity by noise traders induces volatil-

    ity. While we are unable to observe trading by this group directly, we document that

    the overall trading volume of the stocks comprised in the Dow Jones rises subsequent to

    increases in search queries for the index, which is consistent with the noise trader model.

    In a joint model for volatility, trading volume and search queries, the latter turn out to

    be a predictor for volatility while trading volume does not, which is in line with findings

    in the literature (see, for example,  Brooks 1998).

    In a forecasting context, other studies have sucessfully used Google search volume data.

    For example Ginsberg et al.  (2009) use search query data to predict influenza epidemics

    and   Choi and Varian   (2009a,b) employ Google search data to forecast unemployment

    rates and retail sales. In the field of finance, search query data are used to measure retail

    investor attention (Bank, Larch and Peter 2011, Da et al. 2011, Jacobs and Weber 2011,

    Vlastakis and Markellos 2012) and to predict earnings (Da, Engelberg and Gao 2010a,

    Drake, Roulstone and Thornock 2011).  Da, Engelberg and Gao (2010b) use search queries

    related to household concerns to measure investor sentiment.

    2

  • 8/17/2019 Behaviorial Finance.pdf

    4/30

    2 Data and descriptive statistics

    Our analysis focuses on the Dow Jones Industrial Average from July 1, 2006 to December

    31, 2011. Intraday index prices are obtained from RC Research Price-Data.1 We construct

    a time series of daily realized volatility RV t as introduced by Andersen, Bollerslev, Dieboldand Labys (2003) in the following way:

    RV t  =

      n j=1

    r2t,j,   (1)

    where  r2t,j  are squared intraday log-price changes of the index on day   t  during interval  j

    and   n  is the number of such intraday return intervals. We compute these price changes

    over 10 minute intervals in order to circumvent the well-documented microstructure effects

    (see e.g. Andersen et al. 2003, Andersen, Bollerslev and Meddahi 2011, Ghysels and Sinko

    2011).

    Descriptive statistics of Dow Jones realized volatility are presented in Panel A of 

    Table   1. As is evident from the skewness and kurtosis measures, the volatility time

    series is heavily skewed and far from being normally distributed. We therefore resort

    to the log of realized volatility as, amongst others, suggested by   Andersen, Bollerslev,

    Diebold and Ebens (2001) and Andersen et al. (2003). Even though normality of the data

    is still rejected, the data are by far better behaved than before the transformation; in

    particular, excess kurtosis is reduced significantly. The left graph in Figure  2 presents theautocorrelation function for realized volatility. The plot reveals the well-known pattern

    of only slowly decaying autocorrelations (compare e.g. Andersen et al. 2001).

    [Insert Table 1 and Figure 2  about here]

    The data on Google search queries are obtained through  Google Trends .2 We use daily

    data on search volume from July 2006 to December 2011 for the keyword “dow”. The

    volume measure is based on the number of searches which were submitted within the US.

    We start our sample in the second half of 2006, since before July 2006 search volume dataat daily frequency exhibit numerous missing values. To match searches to the respective

    1An earlier version of the paper also considered European stock market indices (the FTSE 100, theCAC 40 and the DAX). We find similar patterns of searches and realized volatilities for these indices,however, the present timing convention of Google Trends, which uses Pacific Standard Time for allcountries, makes the Google search query data less useful in a daily forecasting setting.

    2Source:  Google Trends   (http://www.google.com/trends ).

    3

    http://www.google.com/trendshttp://www.google.com/trends

  • 8/17/2019 Behaviorial Finance.pdf

    5/30

    time series of realized volatility, we only consider trading days.  Google Trends  measures

    searches for a particular day from 12 pm to 12 pm Pacific Standard Time.

    The data which are provided by Google are relative in nature. This means that Google

    does not provide the effective total number of searches, but a search volume index only.

    We standardize the search queries such that the average search frequency over the sample

    period equals one, allowing for an easy interpretation.

    An important issue when measuring investors’ attention to the Dow Jones Industrial

    Average index, is that it is known under a variety of names and acronyms. In this paper

    we use the search term “dow” to measure retail investors’ attention to this index, since

    we find that this short name is the most widely used search term when investors are

    interested in the Dow Jones index.

    [Insert Table 2  about here]

    Table 2 provides an overview of search terms that have the highest correlation with the

    term “dow” according to   Google Correlate .3 The table also compares the level of search

    volume relative to the search term “dow”. Almost perfectly correlated with the search

    term “dow” is “dow jones”, but its volume is considerably less, amounting to 47.7% of 

    the search volume of “dow”. In general, search engine users seem to prefer shorter search

    terms. The search term “dow jones industrial” is also highly correlated with “dow”, but

    amounts only to around 9.9% of the search volume of “dow”. The ticker “djia” is also a

    popular search term and its searches are around 18.4% compared to “dow”. Other searchterms such as “dow close”, “current dow”, “stock market now” are also highly correlated

    with “dow”but their search volume is even less, i.e. below one percent compared to “dow”.

    Search terms with low search volume present two problems: First, the search volume

    index by Google Trends  is less accurate due to the statistical methods used in constructing

    the search volume index. Second, for some terms search traffic is too low, resulting in

    missing values of the search volume index. Because of the high correlation of these search

    terms, the other search terms do not add further information to the searches we use. For

    this reason and to keep matters simple, we confine our analysis to the search term “dow”.

    An alternative to using the Dow Jones would be the S&P 500, which is commonly

    modeled in the realized volatility literature. However, the S&P 500 is less suited for

    our purposes, because we find it to be less followed by retail investors compared to the

    Dow Jones. In our sample period, the search term “dow” has been submitted to Google

    3Source:   Google Correlate   (http://www.google.com/trends/correlate/ ).

    4

    http://www.google.com/trends/correlate/http://www.google.com/trends/correlate/

  • 8/17/2019 Behaviorial Finance.pdf

    6/30

    approximately ten times as often as the term “s&p 500”. Moreover, the acronym “S&P”

    is generally used in the context of rating downgrades in our sample.

    Panel A of Table 1  also holds summary statistics for the data on search queries. Just

    as the realized volatility time series, the data on searches exhibit distinctive levels of 

    skewness and kurtosis. We therefore follow Da et al.  (2011) by also taking logarithms of 

    the search data. This procedure reduces both skewness and excess kurtosis, however, it is

    not as successful as in the case of realized volatility. The right graph in Figure 2 plots the

    autocorrelations of Dow search queries. The autocorrelation function is decaying fairly

    geometrically and much faster compared to autocorrelations of realized volatility depicted

    in the left graph of Figure 2.

    The data on trading volume of the Dow Jones are obtained from Datastream. It is

    measured as the aggregate number of shares traded for the underlying stocks of the index

    per day (in millions). Descriptive statistics are presented in Panel A of Table  1. We alsoresort to taking the logarithm of volume to reduce skewness and excess kurtosis.

    As already apparent from Figure   1, search queries and realized volatility exhibit a

    strong co-movement over time. The contemporaneous correlation of search queries for

    “dow” and realized volatility of the Dow Jones in our sample is quite high as can be seen

    from Panel B of Table  1. We find a correlation coefficient of 0.82 for the level time series

    and 0.74 for the data in logarithms. Volume is correlated to a slightly lower extent with

    search queries (0.51 in levels and 0.44 in logarithms) and realized volatility (0.57, both

    for levels and logarithms).

    3 The dynamics of volatility and searches

    In the following we study the dynamics between realized volatility, search queries and

    trading volume by estimating three vector autoregressive (VAR) models. Let  xt   be a

    vector which contains the variables of interest, then the VAR(3) reads as follows:

    xt  =  c +3

     j=1

    β jxt− j +  εt  ,   (2)

    where  c   is a vector of constants and  εt   is a vector of white noise innovations. Model 1,

    which is central to our analysis, investigates the dynamics between realized volatility and

    search queries, i.e.   xt   = (log-RV t   log-SQt). In Model 2, we will shed light on the

    interaction between search queries and trading volume, i.e.   xt   = (log-SQt   log-V Ot).

    5

  • 8/17/2019 Behaviorial Finance.pdf

    7/30

    Finally, Model 3 is a joint model of realized volatility, search queries and trading volume

    such that  xt  = (log-RV t   log-SQt   log-V Ot).

    Table 3 holds the results of the three VAR models. Coefficient estimates are presented

    in Panel A, the results of the Granger causality Lagrange multiplier (LM) tests are shown

    in Panel B.

    [Insert Table 3  about here]

    In Model 1 we find significant estimates of the autoregressive parameters for the real-

    ized volatility for all included lags. Search queries show significant autoregressive terms

    for lags 1 and 3. The estimation results also reveal that past volatility to some extent

    positively influences present search queries. This effect is concentrated to the first lag

    which is significant on the 5% level only while lags of order 2 and 3 are not found to

    be significant. However, the Granger causality test reveals that realized volatility doesnot Granger cause search queries. Note that there is a high contemporaneous correlation

    between volatility and searches indicating that searches immediately react to a volatility

    shock at the same day.

    The focus of our interest is how past search activity influences present volatility. We

    find that search queries enter the RV-equation significantly at lag 1 while higher order

    lags are insignificant. Furthermore, the Granger causality test indicates that past searches

    provide significant information about future volatility: we reject the null hypothesis of no

    influence of past log-SQ on current log-RV on all conventional significance levels. These

    results support the noise trader model of volatility by Lux and Marchesi (1999). After an

    initial shock in volatility, search intensity rises, which in turn is followed by heightened

    levels of volatility.

    As a further test of the model of  Lux and Marchesi (1999), we analyze the interaction

    of trading volume and search queries. Because trading volume of noise traders is not

    directly observable, we turn to an indirect test using total trading volume. If the interest

    of retail investors in the stock market manifests in trading activity, an increase in the

    number of searches should be followed by rising levels of overall trading volume. We test

    this hypothesis in Model 2. We find that volume has only marginal predictive power forsearch queries, in particular only the third lag is statistically significant. In contrast, search

    queries of the previous day enter significantly in the volume equation which supports our

    prediction that once information has been sought via Google, trading volume rises. The

    Granger causality test also shows that trading volume and search queries are Granger

    causal to each other.

    6

  • 8/17/2019 Behaviorial Finance.pdf

    8/30

    The estimated coefficients of the joint model of realized volatility, search queries and

    trading volume (Model 3) are similar to the estimates of the respective equations in

    Models 1 and 2. Again, search queries enter the realized volatility equation significantly

    on lag 1 (higher order lags are not significant) while volume does not contribute to the

    explanation of the variation in realized volatility which is in line with   Brooks   (1998).

    Moreover, realized volatility is only a weak predictor of search queries: only the first lag

    of log-SQ is significant on a 10% significance level. In this model we find that search queries

    do Granger cause realized volatility while volume does not. Both realized volatility and

    volume Granger cause search queries and realized volatility and search queries Granger

    cause volume.

    Figure  3   provides the impulse response functions corresponding to the model of in-

    terest, Model 1. For the calculation of impulse response functions, we use a Cholesky

    decomposition with the economically meaningful restriction of volatility being contem-poraneously exogenous, i.e. volatility can affect search queries immediately, but search

    queries do not contemporaneously affect volatility. The intuition behind this ordering is

    that there is first a fundamental volatility shock that in turn triggers retail investor atten-

    tion and, thus, search queries. Search queries, on the other hand, would not rise without

    a preceding event on the market (see also the argumentation in Lux and Marchesi 1999).

    [Insert Figure 3  about here]

    The top left and the bottom right graphs in Figure   3  present the response of log-RV and log-SQ, respectively, to their own shocks. As becomes evident from the slowly

    decaying function, these shocks are highly persistent, which corresponds to the slowly

    decaying autocorrelation functions presented in Figure  2.  The top right figure holds the

    impulse response of search queries to a shock in realized volatility. After a volatility

    shock, attention is elevated for a considerable amount of time which is reflected in the

    slowly decaying impulse response function. The bottom left figure presents the response

    of volatility to a shock in search queries. Most noteworthy, the impact of retail investors’

    attention on volatility is long-lasting and only dies out after 40-60 trading days.

    4 Can search queries improve volatility forecasts?

    The key result of the VAR estimation is that search queries help to predict future volatility

    in addition to the information contained in lagged volatility and that a shock in search

    7

  • 8/17/2019 Behaviorial Finance.pdf

    9/30

    volume has an impact on volatility that persists for a considerable span of time. Now we

    focus on the prediction of volatility, and thus investigate if and how searches can help to

    improve volatility forecasts. In the following, we use different modeling approaches which

    are commonly used to capture the time series properties of realized volatility and include

    lagged search queries in each model. We evaluate the forecasting ability of these models

    in- and out-of-sample, for multiple horizons as well as across different states of volatility.

    For one-step ahead predictions an autoregressive model for realized volatility which

    includes search queries as an explanatory variable would be sufficient. For multi-step

    forecasts, however, we need to forecast log-SQ as well. For this reason we model the

    time series properties of search queries along with realized volatility. We estimate vector

    autoregressive models with different lag lengths which are specified as follows:

    log-RV t  =  c1 +

     p

     j=1

    β 1,jlog-RV t− j +  γ 1,1log-SQt−1 + ε1,t   (3a)

    log-SQt  =  c2 + β 2,1log-RV t−1 +

    q j=1

    γ 2,jlog-SQt− j +  ε2,t.   (3b)

    Building on the univariate autoregressive models of   Andersen, Bollerslev, Christoffersen

    and Diebold (2006) and Bollen and Inder  (2002), we set the lag length for the realized

    volatility equation (3a) to either   p  = 1 or   p  = 3. As the results of estimating the VAR

    model in Equation (2) show no significant effect of searches on volatility for higher order

    lags, we only include one lag of searches in Equation (3a). For the same reason weonly allow a one lag impact of realized volatility on search queries in Equation (3b). In

    accordance to the autoregressive structure of the realized volatility equation, we set the

    lag order  q  of the autoregressive terms to 1 if  p = 1 and 3 if  p = 3.

    In addition to these ordinary autoregressive models we also consider  Corsi’s (2009)

    heterogeneous autoregressive (HAR) model. The HAR model has been found to capture

    the long-memory properties of realized volatility very well and has recently been used

    for example by  Andersen, Bollerslev and Diebold  (2007),  Chen and Ghysels  (2011) and

    Chiriac and Voev   (2011). Since we need a forecast of search queries for the multi-step

    8

  • 8/17/2019 Behaviorial Finance.pdf

    10/30

    volatility prediction, we extend the HAR model to a Vector-HAR model which reads as

    follows:

    log-RV t  =  c1 + β dlog-RV t−1 + β wlog-RV  wt−1 + β mlog-RV 

     mt−1 + γ 1,1log-SQt−1 + ε1,t   (4a)

    log-SQt  =  c2 + β 2,1log-RV t−1 +3

     j=1

    γ 2,jlog-SQt− j +  ε2,t.   (4b)

    where   log-RV  wt   =  1

    5

    4

     j=0 log-RV t− j   and   log-RV  mt   =

      1

    22

    21

     j=0 log-RV t− j . The search

    queries Equation (4b) is the same as Equation (3b) since we find that the time series

    properties of searches are well described by three autoregressive terms and one lag of 

    realized volatility.

    The models described above are contrasted to pure realized volatility models, i.e. an

    AR(1) or AR(3) model in case of Equation (3) and a HAR(3) model in case of Equation (4).The univariate models of realized volatility are simply the Equations (3a) and (4a) with

    γ 1,1  restricted to zero.

    In order to assess the forecasting performance, we consider two loss functions which

    are robust to possible noise in the volatility measure (see  Patton 2011). These are the

    mean squared error (MSE) and the quasi-likelihood (QL) for which the corresponding loss

    functions are defined as follows:

    MSE:   L

    RV t+1,

     RV  t+1|t

    =

    RV t+1 −

     RV  t+1|t

    2,   (5)

    QL:   L

    RV t+1, RV  t+1|t =   RV t+1 RV  t+1|t − log  RV t+1 RV  t+1|t − 1,   (6)

    where RV  t+1|t   is the respective forecast of realized volatility based upon informationavailable up to and including time  t. We also use the  R2 of a Mincer and Zarnowitz (1969;

    hereafter MZ) regression of the actual realized volatilities on their predicted values:

    RV t+1 =  b0 + b1 RV  t+1|t + et+1.   (7)

    9

  • 8/17/2019 Behaviorial Finance.pdf

    11/30

    Following the literature (e.g. Aı̈t-Sahalia and Mancini 2008, Andersen et al. 2003, Ghysels,

    Santa-Clara and Valkanov 2006), we model log realized volatility, but evaluate the forecast

    by comparing realized volatility and its prediction.4

    4.1 In-sample forecast evaluation

    Table 4 holds the results of the in-sample forecast evaluation of one-step ahead forecasts of 

    realized volatility. The models we consider are the univariate AR(1), AR(3) and HAR(3)

    models and the respective augmented models including lagged search queries.

    [Insert Table 4  about here]

    Looking only at the univariate models, we see that the AR(3) produces more precise

    forecasts than the AR(1). The higher order lag structure leads to a reduction in MSE of more than 11%. The HAR(3) is the best amongst the univariate models. The MZ R2 is

    0.96 percentage points higher than for the AR(3) and even 4.38 percentage points higher

    than for the AR(1) model. These findings are in line with the literature on volatility

    prediction, which shows that the heterogeneous autoregressive model, including lagged

    weekly and monthly volatility, captures the long memory properties of realized volatility

    best (cf. Corsi 2009).

    Comparing the univariate models (AR(1), AR(3), HAR(3)) to the SQ-augmented mod-

    els (AR(1)+SQ, AR(3)+SQ, HAR(3)+SQ), we observe for all models an improvement in

    performance. The addition of search queries leads to a reduction of MSE between almost

    5% in the AR(1) case and 3.4% in case of the HAR(3) model. Again, we find that overall

    the HAR model augmented with search queries shows the best fit, i.e. we find the lowest

    MSE, the lowest value of the QL loss function and the highest  R2 of the MZ regression

    (marked in bold in Table 4).

    4.2 Out-of-sample forecast evaluation

    We now turn to the out-of-sample forecasts and provide 1 day, 1 week and 2 weeks volatility

    forecasts. Weekly and bi-weekly volatility is measured as the aggregated volatility over 5and 10 business days, respectively. For our initial out-of-sample forecast we estimate the

    models using the first two years (500 trading days) of our sample, i.e. from July 2006 to

    4When reversing the log transformation, the forecasts are formally not optimal (Granger and Newbold1976). However, Lütkepohl and Xu (2010) show by means of an extensive simulation study that this näıveforecast generally performs just as well as the formally optimal forecast.

    10

  • 8/17/2019 Behaviorial Finance.pdf

    12/30

    June 2008. We then re-estimate the model for every subsequent day in the sample using

    all past observations available, i.e. we increase the estimation window over time. The

    estimation period of the very first run ends in June 2008. Thus, we are able to compare

    the forecasting performance of volatility models during the almost record-high volatility of 

    October 2008. The initial two year estimation period is still long enough and has enough

    variation in both volatility and search activity (cf. Figure 1) for a reliable estimation of 

    model parameters.

    [Insert Table 5  about here]

    Results of the out-of-sample prediction are summarized in Table  5. For the univariate

    models our results are consistent with the findings of  Corsi (2009). The HAR(3) model is

    better at predicting realized volatility compared to the AR(3) or AR(1) model in terms of 

    MSE, MZ R2 and QL statistic. The advantage of the HAR modeling emerges particularly

    when predicting volatility at longer horizons of one or two weeks. For the latter the

    reduction of the MSE amounts to a considerable 25% as compared to the AR(3) model.

    Turning to the multivariate models, we find that those which include searches as an

    explanatory variable always outperform the univariate, pure realized volatility models.

    This means that these models have a lower MSE, a lower value of the QL loss function

    and a higher R2 in the Mincer-Zarnowitz regression. Adding searches is most beneficial for

    longer-horizon forecasts: the MZ  R2 is by 3.1 percentage points higher in the multivariate

    VHAR(3) than in the univariate HAR(3). In case of the AR-models, this difference caneven be larger.

    Overall, the best performing univariate model for realized volatility is the HAR model.

    Augmenting the HAR model with search query data further improves the forecasting

    performance, in particular at longer horizons. The intuition behind is the following:

    the VHAR model benefits from modeling the dynamics of retail investors’ searches and

    volatility. The VHAR gains from the fact that a shock in searches has a significant

    impact on volatility that is persistent as can be seen from the impulse-response function

    of Figure 3. Thus, searches can improve long-run predictions. Furthermore, search queries

    are well described by the autoregressive time series model allowing for good predictions

    of searches when the system is iterated forward.

    11

  • 8/17/2019 Behaviorial Finance.pdf

    13/30

    4.3 Forecast evaluation across different times of volatility

    Precise volatility predictions are of particular importance during times of financial turmoil.

    Thus, it is of great interest how the models perform in phases of high volatility compared to

    calmer periods. In order to shed light on this question, we sort days according to realizedvolatility and calculate MSE and the QL loss function for several quantiles separately.

    Our analysis focuses on the two best performing models, the univariate HAR model and

    the VHAR model including search queries.

    [Insert Table 6  about here]

    Table 6 presents the results of this sorting. It shows the MSE and QL loss function for

    the days with the lowest volatility in the sample (Bottom 1%, 5% to 50%) as well as the

    days with the highest volatility in the sample (Top 50%, 25% to 1%). Panel A displaysthe in-sample results, Panel B the out-of-sample results.

    Not surprisingly, volatility prediction in calm times is very accurate. The in-sample

    MSE for the 50% least volatile days is merely 0.027 for the univariate HAR model. The

    MSE for the HAR model augmented with search queries is virtually the same with 0.026.

    In general, the prediction accuracy of the two models does not differ much throughout the

    bottom quintiles. Thus, the autoregressive model provides a good fit of volatility in calm

    times and search queries do not add information about future volatility in these times.

    Turning to the most volatile days, we see that prediction accuracy weakens for both

    models but at the same time we observe that the model including searches increasingly

    outperforms the univariate realized volatility model. The HAR model in-sample MSE for

    the 50% most volatile days amounts to 0.267 and increases to 6.875 for the 1% most volatile

    days. In high volatility times it becomes increasingly more difficult for the autoregressive

    model to predict volatility. Also the MSE for the bivariate HAR model of realized volatility

    and search queries increases in more volatile times, but to a lower extent. The MSE for

    the model including searches is 6.396 compared to 6.875 for the univariate model, which

    is a reduction of 7.0%. A similar picture can be seen for the QL loss function. For

    the 1% most volatile days the QL loss function of the SQ-augmented model yields 33 .16compared to 36.36 for the univariate model, which is a reduction of 8.8%. The difference

    in the bottom quintiles between the QL and MSE originates from the different weights

    the loss functions puts on the prediction errors.

    A similar pattern is found for the out-of-sample prediction. Results are presented in

    Panel B of Table 6. Again, for the bottom quantiles the difference in MSE is negligible

    12

  • 8/17/2019 Behaviorial Finance.pdf

    14/30

    while the QL loss function slightly favors the pure volatility model over the model which

    includes search queries. In phases of high volatility, however, the model with searches

    clearly outperforms the HAR(3). The performance increase is even more pronounced

    than in the in-sample case: in the 1% highest volatility days, the prediction MSE can be

    reduced by 13% if search queries are included. Also, the value of the QL is significantly

    reduced by over 12%.

    In summary, the cascading structure of the heterogeneous autoregressive model seems

    to capture the long-memory properties of realized volatility very well. However, in a crisis

    period (when volatility is high) retail investors’ attention is an important component and

    predictor of volatility. If we interpret the HAR model as a model of agents with different

    time horizons (namely daily, weekly and monthly) as suggested by  Corsi (2009), we can

    understand retail investors as a fourth investor group that adds to volatility in very tur-

    bulent times.

    4.4 Robustness of out-of-sample forecast

    One possible concern about the practical usefulness of search data in predicting volatility

    is the time of publication of the data. Up to now,  Google Trends  publishes search volume

    daily with a lag of only one day. Even though lagged publication may reduce the fore-

    casting power of search queries for volatility, the analysis of the forecasting relationship

    presented so far helps us to gain a better understanding of the dynamics of retail investors’attention to the stock market and their role in stock market volatility. Furthermore, note

    that technically it would be possible to publish search volume even faster. At present,

    Google publishes the search volume for the fastest rising searches in the US every hour

    through Google Hot Trends  with only a few hours delay.5

    Nevertheless, to account for the delay in the publication of search query data we also

    look at an out-of-sample forecast based on search query data of two lags, i.e. volatility

    tomorrow is predicted using volatility data until today and search data until yesterday.

    Table   A.1   in the Appendix presents the results of this robustness check. The general

    results of Table 5  are confirmed: models with search query data generally outperform the

    univariate realized volatility models, but (as expected) the forecasting power of search

    queries is reduced. One exception is the VAR(3) model, where the MZ  R2 is slightly lower

    than that of the AR(3) model. However, the MSE and QL loss function again favor the

    5Source:   Google Hot Trends   (http://www.google.com/trends/hottrends )

    13

    http://www.google.com/trends/hottrendshttp://www.google.com/trends/hottrends

  • 8/17/2019 Behaviorial Finance.pdf

    15/30

    model with search queries. Overall, the HAR model including search queries remains the

    best performing model.

    5 Concluding remarksInternet search data can describe the interest of individuals as recently advocated by  Choi

    and Varian (2009a) or Da et al.  (2011). In this paper we use daily search query data to

    measure the individuals’ interest in the aggregate stock market. We find that investors’

    attention to the stock market rises in times of strong market movements. Moreover, a rise

    in investors’ attention is followed by higher volatility. These findings are consistent with

    agent-based models of volatility (Lux and Marchesi 1999, Alfarano and Lux 2007).

    Exploiting the fact that search queries Granger cause volatility, we incorporate searches

    in several prediction models for realized volatility. Augmenting these models with searchqueries leads to more precise in- and out-of-sample forecasts, in particular in the long

    run and in high volatility phases. Thus, search queries constitute a valuable source of 

    information for predicting future volatility which can essentially be used in real time.

    14

  • 8/17/2019 Behaviorial Finance.pdf

    16/30

    References

    Äıt-Sahalia, Y. and Mancini, L.: 2008, Out of sample forecasts of quadratic variation,

    Journal of Econometrics  147(1), 17–33.

    Alfarano, S. and Lux, T.: 2007, A noise trader model as a generator of apparent financial

    power laws and long memory,  Macroeconomic Dynamics  11(Supplement S1), 80–101.

    Andersen, T. G. and Bollerslev, T.: 1997, Heterogeneous Information Arrivals and Re-

    turn Volatility Dynamics: Uncovering the Long-Run in High Frequency Returns, The 

    Journal of Finance  52(3), 975–1005.

    Andersen, T. G., Bollerslev, T., Christoffersen, P. F. and Diebold, F. X.: 2006, Prac-

    tical Volatility and Correlation Modeling for Financial Market Risk Management,   in 

    M. Carey and R. M. Stulz (eds),   The Risks of Financial Institutions , University of 

    Chicago Press, Chicago, Illinois, chapter 17, pp. 513–548.

    Andersen, T. G., Bollerslev, T. and Diebold, F. X.: 2007, Roughing It Up: Including Jump

    Components in the Measurement, Modeling, and Forecasting of Return Volatility,  The 

    Review of Economics and Statistics  89(4), 701–720.

    Andersen, T. G., Bollerslev, T., Diebold, F. X. and Ebens, H.: 2001, The distribution of 

    realized stock return volatility, Journal of Financial Economics  61(1), 43–76.

    Andersen, T. G., Bollerslev, T., Diebold, F. X. and Labys, P.: 2003, Modeling and Fore-

    casting Realized Volatility,  Econometrica  71(2), 529–626.

    Andersen, T. G., Bollerslev, T. and Meddahi, N.: 2011, Realized Volatility Forecasting

    and Market Microstructure Noise,  Journal of Econometrics  160(1), 220–234.

    Bank, M., Larch, M. and Peter, G.: 2011, Google search volume and its influence on

    liquidity and returns of German stocks,   Financial Markets and Portfolio Management 

    25(3), 239–264.

    Bollen, B. and Inder, B.: 2002, Estimating daily volatility in financial markets utilizing

    intraday data, Journal of Empirical Finance  9(5), 551–562.

    Brooks, C.: 1998, Predicting stock index volatility: can market volume help?,  Journal of 

    Forecasting  17(1), 59–80.

    15

  • 8/17/2019 Behaviorial Finance.pdf

    17/30

    Chen, X. and Ghysels, E.: 2011, News - Good or Bad - and its Impact on Volatility

    Predictions over Multiple Horizons,  Review of Financial Studies  24(1), 46–81.

    Chiriac, R. and Voev, V.: 2011, Modelling and forecasting multivariate realized volatility,

    Journal of Applied Econometrics  26(6), 922–947.

    Choi, H. and Varian, H.: 2009a, Predicting initial claims for unemployment benefits,

    Working Paper  .

    Choi, H. and Varian, H.: 2009b, Predicting the present with Google trends,   Working 

    Paper  pp. 1–23.

    Corsi, F.: 2009, A Simple Approximate Long-Memory Model of Realized Volatility,  Jour-

    nal of Financial Econometrics  7(2), 174–196.

    Da, Z., Engelberg, J. and Gao, P.: 2010a, In search of earnings predictability, Working 

    Paper  .

    Da, Z., Engelberg, J. and Gao, P.: 2010b, The Sum of All FEARS: Investor Sentiment

    and Asset Prices,  Working Paper  .

    Da, Z., Engelberg, J. and Gao, P.: 2011, In Search of Attention,  The Journal of Finance 

    66(5), 1461–1499.

    Drake, M., Roulstone, D. and Thornock, J.: 2011, Investor Information Demand: Evi-dence from Google Searches around Earnings Announcements,  Working Paper   .

    Engelberg, J. and Parsons, C.: 2011, The Causal Impact of Media in Financial Markets,

    The Journal of Finance  66(1), 67–97.

    Foucault, T., Sraer, D. and Thesmar, D. J.: 2011, Individual Investors and Volatility,  The 

    Journal of Finance  66(4), 1369–1406.

    Ghysels, E., Santa-Clara, P. and Valkanov, R.: 2006, Predicting Volatility: Getting the

    Most out of Return Data Sampled at Different Frequencies,   Journal of Econometrics 131(1-2), 59–95.

    Ghysels, E. and Sinko, A.: 2011, Volatility Forecasting and Microstructure Noise,  Journal 

    of Econometrics  160(1), 257–271.

    16

  • 8/17/2019 Behaviorial Finance.pdf

    18/30

    Ginsberg, J., Mohebbi, M. H., Patel, R. S., Brammer, L., Smolinski, M. S. and Bril-

    liant, L.: 2009, Detecting influenza epidemics using search engine query data, Nature 

    457, 1012–1014.

    Granger, C. W. J.: 1969, Investigating Causal Relations by Econometric Models andCross-spectral Methods,  Econometrica  37(3), 424–438.

    Granger, C. W. J. and Newbold, P.: 1976, Forecasting Transformed Series,  Journal of the 

    Royal Statistical Society. Series B (Methodological)  38(2), 189–203.

    Jacobs, H. and Weber, M.: 2011, The Trading Volume Impact of Local Bias: Evidence

    from a Natural Experiment,  Review of Finance  (forthcoming).

    Lütkepohl, H. and Xu, F.: 2010, The role of the log transformation in forecasting economic

    variables,  Empirical Economics  pp. 1–20.

    Lux, T. and Marchesi, M.: 1999, Scaling and criticality in a stochastic multi-agent model

    of a financial market, Nature  397, 498–500.

    Mincer, J. A. and Zarnowitz, V.: 1969, The Evaluation of Economic Forecasts,  in  J. A.

    Mincer (ed.),  Economic Forecasts and Expectations: Analysis of Forecasting Behavior 

    and Performance , Studies in Business Cycles, NBER.

    Patton, A. J.: 2011, Volatility forecast comparison using imperfect volatility proxies,

    Journal of Econometrics  160(1), 246–256.

    Vlastakis, N. and Markellos, R. N.: 2012, Information demand and stock market volatility,

    Journal of Banking and Finance  36(6), 1808 – 1821.

    17

  • 8/17/2019 Behaviorial Finance.pdf

    19/30

    Tables and Figures

       0

     .   0   2

     .   0   4

     .   0   6

     .   0   8

     .   1

       R  e  a   l   i  z  e   d  v  o   l  a   t   i   l   i   t  y

    2007 2008 2009 2010 2011 2012

    Realized volatility of the Dow Jones

       0

       5

       1   0

       S  e  a  r  c   h   V  o   l  u  m  e   I  n   d  e  x

    2007 2008 2009 2010 2011 2012

    Search queries for the index name

    Figure 1:

    Realized volatility and search activityThis figure displays daily realized volatility of the Dow Jones (upper graphic) and the volumeof search queries (lower graphic) for its name from July 1, 2006 to December 31, 2011. Searchqueries are standardized such that the sample average equals one.

    18

  • 8/17/2019 Behaviorial Finance.pdf

    20/30

       −      0 .      2

          0

          0 .      0

          0

          0 .      2

          0

          0 .      4

          0

          0 .      6

          0

          0 .      8

          0

          1 .      0

          0

          A     u      t     o     c     o     r     r     e      l     a      t      i     o

         n     s

    0 20 40 60 80Lag

    Realized volatility

       −      0 .      2

          0

          0 .      0

          0

          0 .      2

          0

          0 .      4

          0

          0 .      6

          0

          0 .      8

          0

          1 .      0

          0

          A     u      t     o     c     o     r     r     e      l     a      t      i     o

         n     s

    0 20 40 60 80Lag

    Search queries

    Figure 2:Autocorrelations of realized volatility and search queriesThis figure displays the autocorrelations of the Dow Jones’ realized volatility (left graph)

    and search queries for its name (right graph) in the sample period. Shaded areas indicate95% confidence bounds.

    19

  • 8/17/2019 Behaviorial Finance.pdf

    21/30

            0

      .        1

      .        2

      .        3

    0 20 40 60 80 100Days

    Response of volatility to a shock in volatility

            0

      .        0        2

      .        0        4

      .        0        6

      .        0        8

    0 20 40 60 80 100Days

    Response of searches to a shock in volatility

            0

      .        0        2

      .        0        4

      .        0        6

    0 20 40 60 80 100Days

    Response of volatility to a shock in searches

            0

      .        0        5

      .        1

      .        1        5

    0 20 40 60 80 100Days

    Response of searches to a shock in searches

    Figure 3:Impulse response functionsThe table displays the impulse response functions of the VAR(3) of realized volatility andsearch queries (Table 3, Model 1). Shaded areas indicate 95% confidence bounds.

    20

  • 8/17/2019 Behaviorial Finance.pdf

    22/30

    Table 1:Summary statistics and correlationsThis table provides descriptive statistics of realized volatility (RV), search queries (SQ) andtrading volume (VO) of the Dow Jones between July 1, 2006 and December 31, 2011. Theupper panel holds summary statistics while the lower panel presents sample correlations.

    Panel A: Summary statistics

    Mean S.D. Skewness Kurtosis Min. Max.

    RV 0.009 0.007 3.90 30.75 0.002 0.096

    log-RV -4.87 0.56 0.57 3.57 -6.38 -2.34

    SQ 1.00 0.72 5.21 48.21 0.29 11.27

    log-SQ -0.13 0.46 1.25 5.70 -1.23 2.42

    VO 234.0 86.8 1.48 6.10 52.6 672.9

    log-VO 5.39 0.34 0.16 3.82 3.96 6.51

    Panel B: Correlations

    (1) (2) (3) (4) (5) (6)

    RV (1) 1

    log-RV (2) 0.89 1

    SQ (3) 0.82 0.67 1

    log-SQ (4) 0.76 0.74 0.89 1

    VO (5) 0.57 0.58 0.51 0.49 1

    log-VO (6) 0.53 0.57 0.44 0.44 0.96 1

    21

  • 8/17/2019 Behaviorial Finance.pdf

    23/30

    Table 2:Correlation of “dow” and other search termsThis table shows the top ten search terms that are correlated with the search term “dow”during the sample period July 2006 to December 2011 at weekly frequency. The second

    column shows the correlation coefficient, which was obtained from   Google Correlate . Thethird column shows the search volume scaled relative to the average search volume of “dow”,which is standardized to 100. For example, 46.67 in row 1 means that the search volume of “dow jones” relative to the search volume of “dow” is 46.67%. The search volume scale wasobtained from  Google Trends  and re-scaled for the sample size.

    Rank Correlation Rel. Volume (%) Search term1 0.9981 46.67 dow jones2 0.9707 18.40 djia3 0.9672 6.13 dow stock4 0.9612 0.33 dow close5 0.9612 0.31 current dow

    6 0.9591 9.91 dow jones industrial7 0.9565 0.16 current dow jones8 0.9545 0.39 google dow9 0.9481 0.04 stock market now

    10 0.9476 9.61 industrial average

    22

  • 8/17/2019 Behaviorial Finance.pdf

    24/30

        T   a    b    l   e    3   :

        V    A    R

       m   o    d   e    l   e   s    t    i   m   a    t    i   o   n   r   e   s   u    l    t   s

        T    h    i   s    t   a    b    l   e    d    i   s   p    l   a   y   s    t    h   e   e   s    t    i   m   a    t    i   o   n   r   e   s   u    l    t   s   o    f   a    V   e   c    t   o   r    A   u    t   o   r   e   g   r   e   s   s    i   v   e   m   o    d   e    l ,    V    A    R    (    3    ) ,

        f   o   r    l   o   g   r   e   a    l    i   z   e    d   v   o    l   a    t    i    l    i    t   y    (    l   o   g  -    R    V    ) ,    l   o   g   s   e   a   r   c    h

       q   u   e   r    i   e   s    (    l   o   g  -    S    Q    )   a   n    d    l   o   g    t   r   a    d    i   n   g   v   o    l   u   m   e    (    l   o   g  -    V    O    )    f   o   r    t    h   e    D   o   w    J   o   n   e   s    i   n    d   e   x .    P   a   n   e    l    A   p   r   o   v    i    d   e   s   c   o   e    ffi   c    i   e   n    t   e   s    t    i   m   a    t   e   s ,    P   a   n   e    l    B

        t    h   e   r   e   s   u    l    t   s   o    f

       a    G   r   a   n   g   e   r   c   a   u   s   a    l    i    t   y    L   a   g   r   a   n   g   e    M   u    l    t    i   p    l    i   e   r    t   e   s    t .   p  -   v   a    l   u   e   s   a   r   e   g    i   v   e   n    i   n

       p   a   r   e   n    t    h   e   s   e   s .    S    i   g   n    i    fi   c   a   n   c   e   a    t    t    h   e    5    %    l   e   v   e    l    i   s

        h    i   g    h    l    i   g    h    t   e    d    i   n

        b   o    l    d    f   a   c   e .

        P   a   n   e    l    A   :    V    A    R

       e   s    t    i   m   a    t    i   o   n

         M   o     d   e     l    1

         M   o     d   e     l    2

         M   o     d   e     l    3

         l   o   g  -     R     V

              t

         l   o   g  -     S     Q

              t

         l   o   g  -     S     Q

              t

         l   o   g  -     V     O

              t

         l   o   g

      -     R     V

              t

         l   o   g  -     S     Q

              t

         l   o   g  -     V     O

              t

         l   o   g  -     R     V

              t   −        1

        0 .    4

        5

        0 .    0

        3

        0

     .    4    5

        0 .    0    3

        0 .    0

        7

         (    0 .    0    0    0     )

         (    0 .    0     4    3     )

         (    0 .    0    0    0     )

         (    0 .    0    9    0     )

         (    0 .    0    0    6     )

         l   o   g  -     R     V

              t   −        2

        0 .    2

        1

      -    0 .    0    0

        0

     .    2    3

        0 .    0    0

      -    0 .    0    2

         (    0 .    0    0    0     )

         (    0 .    8    5    9     )

         (    0 .    0    0    0     )

         (    0 .    8    8    2     )

         (    0 .     4    5     7     )

         l   o   g  -     R     V

              t   −        3

        0 .    1

        6

      -    0 .    0    1

        0

     .    1    5

        0 .    0    1

      -    0 .    0    1

         (    0 .    0    0    0     )

         (    0 .    5     4     4     )

         (    0 .    0    0    0     )

         (    0 .    6    0    9     )

         (    0 .     7    5    2     )

         l   o   g  -     S     Q

              t   −        1

        0 .    2

        3

        0 .    8

        1

        0 .    8

        2

        0 .    1

        7

        0

     .    2    2

        0 .    8

        0

        0 .    1

        4

         (    0 .    0    0    0     )

         (    0 .    0    0    0     )

         (    0 .    0    0    0     )

         (    0 .    0    0    0     )

         (    0 .    0    0    0     )

         (    0 .    0    0    0     )

         (    0 .    0    0    2     )

         l   o   g  -     S     Q

              t   −        2

      -    0 .    1    0

      -    0 .    0    5

      -    0 .    0     4

      -    0 .    0    9

      -    0 .    0    8

      -    0 .    0    5

      -    0 .    0     7

         (    0 .    1    5    0     )

         (    0 .    1    6    8     )

         (    0 .    2    2     7     )

         (    0 .    1    1    1     )

         (    0 .    2     4    3     )

         (    0 .    2    3    0     )

         (    0 .    1    8    2     )

         l   o   g  -     S     Q

              t   −        3

      -    0 .    0    1

        0 .    1

        7

        0 .    1

        9

      -    0 .    0    6

      -    0 .    0    2

        0 .    1

        7

      -    0 .    0    6

         (    0 .    9    0    9     )

         (    0 .    0    0    0     )

         (    0 .    0    0    0     )

         (    0 .    1    8     4     )

         (    0 .     7    1    1     )

         (    0 .    0    0    0     )

         (    0 .    1    5    3     )

         l   o   g  -     V     O

              t   −        1

        0 .    0    2

        0 .    4

        1

      -    0 .    0    0

        0 .    0    1

        0 .    3

        7

         (    0 .    2    1    6     )

         (    0 .    0    0    0     )

         (    0 .    9    8     4     )

         (    0 .    6    6     7     )

         (    0 .    0    0    0     )

         l   o   g  -     V     O

              t   −        2

      -    0 .    0    1

        0 .    2

        2

      -    0 .    0    5

      -    0 .    0    2

        0 .    2

        3

         (    0 .     4    8     4     )

         (    0 .    0    0    0     )

         (    0 .    2    0     7     )

         (    0 .     4    2     7     )

         (    0 .    0    0    0     )

         l   o   g  -     V     O

              t   −        3

      -    0 .    0

        5

        0 .    1

        5

        0

     .    0    2

      -    0 .    0

        5

        0 .    1

        6

         (    0 .    0    1    0     )

         (    0 .    0    0    0     )

         (    0 .    6     4    9     )

         (    0 .    0    1    1     )

         (    0 .    0    0    0     )

         C   o   n   s    t   a   n    t

      -    0 .    8

        6

        0 .    0    9

        0 .    2

        1

        1 .    2

        0

      -    0 .    6

        2

        0 .    5

        3

        1 .    5

        3

         (    0 .    0    0    0     )

         (    0 .    1    6     4     )

         (    0 .    0    1    8     )

         (    0 .    0    0    0     )

         (    0 .    0    1     7     )

         (    0 .    0    0    0     )

         (    0 .    0    0    0     )

    23

  • 8/17/2019 Behaviorial Finance.pdf

    25/30

        T   a    b    l   e    3  -     C   o   n    t    i   n   u   e     d

        P   a   n   e

        l    B   :    G   r   a   n   g   e   r   c   a   u   s   a    l    i    t   y    t   e   s

        t

         M   o     d   e     l    1

         M   o     d   e     l    2

         M   o     d   e     l    3

         E   x   c     l   u     d   e     d   :

         l   o   g  -     R     V

              t

         l   o   g  -     S     Q

              t

         l   o   g  -     S     Q

              t

         l   o   g  -     V     O

              t

         l   o   g  -     R     V

              t

         l   o   g  -     S     Q

              t

         l   o   g  -     V     O

              t

         l   o   g  -     R     V

        5 .    2    0

        8 .    7

        8

        9 .    1

        2

         (    0 .    1    5    8     )

         (    0 .    0    3    2     )

         (    0 .    0    2    8     )

         l   o   g  -     S     Q

        3    1 .    6

        1

        1    9 .    7

        8

        3    0 .    6

        5

        1    0 .    8

        0

         (    0 .    0    0    0     )

         (    0 .    0    0    0     )

         (    0

     .    0    0    0     )

         (    0 .    0    1    3     )

         l   o   g  -     V     O

        1    1 .    4

        1

        2 .    2    1

        1    5 .    0

        1

         (    0 .    0    1    0     )

         (    0

     .    5    3    1     )

         (    0 .    0    0    2     )

    24

  • 8/17/2019 Behaviorial Finance.pdf

    26/30

    Table 4:In-sample forecast evaluationThe table compares the in-sample forecasts of the models described in the first column.AR(1), AR(3) and HAR(3) are univariate models of realized volatility only, AR(1)+SQ,AR(3)+SQ and HAR(3)+SQ are the models augmented with lagged search queries. Perfor-

    mance measures are the mean squared error (MSE,×

    10

    4

    ), the quasi-likelihood loss function(QL,   ×102) and the   R2 (in percent) of the Mincer-Zarnowitz regression. The preferredmodel (minimum of QL loss function and MSE, maximum of  R2) is indicated through boldnumbers.

    Model: MSE QL   R2

    AR(1) 0.172 22.819 65.96

    AR(1) + SQ 0.164 22.408 66.64

    AR(3) 0.153 21.273 69.38

    AR(3) + SQ 0.148 21.109 69.97HAR(3) 0.148 21.155 70.34

    HAR(3) + SQ   0.143 21.071 71.11

    25

  • 8/17/2019 Behaviorial Finance.pdf

    27/30

        T   a    b    l   e    5   :

        O   u    t  -   o    f  -   s   a   m

       p    l   e    f   o   r   e   c   a   s    t   e   v   a    l   u   a    t    i   o   n

        T    h   e    t   a    b    l   e   c   o   m

       p   a   r   e   s    t    h   e    1    d   a   y ,    1   w   e   e    k   a   n    d    2   w   e   e

        k   s   o   u    t  -   o    f  -   s   a   m   p    l   e    f   o   r   e   c   a   s    t   s   o    f    t    h   e   m

       o    d   e    l   s    d   e   s   c   r    i    b   e    d    i   n    t    h   e    fi   r   s    t   c   o    l   u   m   n

     .    A    R    (    1    ) ,

        A    R    (    3    )   a   n    d    H    A

        R    (    3    )   a   r   e   u   n    i   v   a   r    i   a    t   e   m   o    d   e    l   s   o    f   r   e   a    l    i   z   e    d   v   o    l   a    t    i    l    i    t   y   o   n    l   y ,    V    A    R    (    1    ) ,    V    A    R    (    3    )   a   n    d    V    H    A    R    (    3    )   a   r   e    b    i   v   a   r    i   a    t   e   m   o    d   e    l   s   o    f   r   e   a    l    i   z   e    d

       v   o    l   a    t    i    l    i    t   y    (    R    V    )   a   n    d   s   e   a   r   c    h   q   u   e   r    i   e   s    (    S    Q    ) .    P   e   r    f   o   r   m   a

       n   c   e   m   e   a   s   u   r   e   s   a   r   e    t    h   e   m   e   a   n   s   q   u   a   r   e    d   e   r   r   o   r    (    M    S    E ,      ×    1    0       4    ) ,    t    h   e   q   u   a   s    i  -    l    i    k   e    l    i    h   o   o    d    l   o   s   s

        f   u   n   c    t    i   o   n    (    Q    L ,

          ×    1    0       2    )   a   n    d    t    h   e     R       2

        (    i   n   p   e   r   c   e   n    t    )   o    f

        t    h   e    M    i   n   c   e   r  -    Z   a   r   n   o   w    i    t   z   r   e   g   r   e   s   s    i   o   n .

        T    h   e   p   r   e    f   e   r   r   e    d   m   o    d   e    l    (   m    i   n    i   m   u   m   o

        f    Q    L    l   o   s   s

        f   u   n   c    t    i   o   n   a   n    d    M

        S    E ,   m   a   x    i   m   u   m   o    f     R       2    )    i   s    i   n    d    i   c   a    t   e    d

        t    h   r   o   u   g    h    b   o    l    d   n   u   m    b   e   r   s .

        1     d   a   y

        1   w   e   e     k

        2   w   e   e     k   s

         M   o     d   e     l   :

         M     S     E

         Q     L

         R        2

         M     S     E

         Q     L

         R        2

         M     S     E

         Q     L

         R

            2

         A     R     (    1     )

         R     V

        0 .    2     4    0

        5 .     4    1    3

        6     4 .    1     4

        6 .     7    3    1

        6 .    1     4    0

        6

        1 .    5    0

        3     4 .    3    0    6

        9 .    1    3    0

        5    0

     .    0    8

         V     A     R     (    1     )

         R     V ,     S     Q

        0 .    2    2     4

         4 .    8    0     7

        6     4 .     7    1

         4 .    8    9    6

         4 .    6    9    8

        6

        3 .     7    2

        2     4 .    0    6    3

        6 .     4    9    8

        5    5

     .    0     4

         A     R     (    3     )

         R     V

        0 .    2    1    0

         4 .    5    2     7

        6     7 .    8    0

         4 .    3    6     7

        3 .    9     4    9

        6

        9 .    5    6

        2    1 .    1    2    9

        5 .    3    3    9

        6    2

     .     7    5

         V     A     R     (    3     )

         R     V ,     S     Q

        0 .    2    0    2

         4 .    2    8    0

        6    8 .    1    2

        3 .    9    1    6

        3 .     4     7    2

        6

        9 .     7     7

        1     7 .     4    3    6

         4 .     4    9    5

        6    3

     .    9    6

         H     A     R     (    3     )

         R     V

        0 .    1    9    8

         4 .    3    2     4

        6    9 .    0     7

        3 .    6    5    5

        3 .     4    5    0

         7

        2 .    0     7

        1    5 .    6    9    9

         4 .    1    9    8

        6     7

     .     7     4

         V     H     A     R     (

        3     )

         R     V ,     S     Q

        0 .    1

        9    4

        4 .    1

        4    2

        6    9 .    7

        4

        3 .    5

        4    8

        3 .    1

        7    9    7

        3 .    6

        0

        1    4 .    8

        7    6

        3 .    7

        6    3

        7    0

     .    8    1

    26

  • 8/17/2019 Behaviorial Finance.pdf

    28/30

        T   a    b    l   e    6   :

        F   o   r   e   c   a   s    t   e   v   a    l   u   a    t    i   o   n   a   c   r   o   s   s    d    i    ff   e   r   e   n    t    t

        i   m   e   s   o    f   v   o    l   a    t    i    l    i    t   y

        T    h   e    t   a    b    l   e   c   o   m

       p   a   r   e   s    t    h   e    f   o   r   e   c   a   s    t    i   n   g   p   e   r    f   o   r   m   a   n   c   e

       o    f    t    h   e   u   n    i   v   a   r    i   a    t   e   r   e   a    l    i   z   e    d   v   o    l   a    t    i    l    i    t   y    H    A    R    (    3    )   m   o    d   e    l    t   o    t    h   e    H    A    R    (    3    )   o

        f   r   e   a    l    i   z   e    d

       v   o    l   a    t    i    l    i    t   y    (    R    V    )   a   n    d   s   e   a   r   c    h   q   u   e   r    i   e   s    (    S    Q    )   a   c   r   o   s   s   s    t   a    t   e   s   o    f    d    i    ff   e   r   e   n    t   v   o    l   a    t    i    l    i    t   y .    W   e   s   o   r    t   a    l    l    d   a   y   s    i   n    t   o    t    h   e    b   o    t    t   o   m   a   n    d    t   o

       p    1    %    5    %

        t   o    5    0    %   o    f   v   o    l   a    t    i    l    i    t   y   a   n    d   c   a    l   c   u    l   a    t   e   m   e   a   n   s   q   u   a   r   e    d

       e   r   r   o   r    (    M    S    E ,      ×    1    0       4    )   a   n    d    t    h   e   q   u   a   s    i  -    l    i    k   e    l    i    h   o   o    d    l   o   s   s    f   u   n   c    t    i   o   n    (    Q    L ,      ×    1    0       2    )    f   o   r   e   a   c    h

       p   e   r   c   e   n    t    i    l   e .

        P   a   n   e    l    A   :    I   n  -   s   a   m   p    l   e   p   r   e    d    i   c    t    i   o   n

         R   e   a     l    i   z   e     d     V   o     l   a    t

        i     l    i    t   y

         B   o    t    t   o   m

         T   o   p

        1     %

        5     %

        1    0     %

        2    5     %

        5    0     %

        5    0     %

        2    5     %

        1    0     %

        5     %

        1     %

         M     S     E   :

         H     A     R     (    3     )

        0 .    0     4    0

        0 .    0    2     4

        0 .    0    2    1

        0 .    0    2    0

        0 .    0    2     7

        0 .    2    6     7

        0 .     4    8     7

        1 .    0     4     7

        1 .    8     7    0

        6 .    8     7    5

         M     S     E   :

         H     A     R     (    3     )    +

         S     Q

        0 .    0    3    8

        0 .    0    2    5

        0 .    0    2    1

        0 .    0    1    9

        0 .    0    2    6

        0 .    2    5    8

        0 .     4    6    9

        0 .    9    9    5

        1 .     7     4    8

        6 .    3    9    6

         D    i     ff   e   r   e   n   c   e    i   n     M     S     E

        0 .    0    0    2

      -    0 .    0    0    1

        0 .    0    0    0

        0 .    0    0    1

        0 .    0    0    1

        0 .    0    1    0

        0 .    0    1    8

        0 .    0    5    2

        0 .    1    2    2

        0 .     4     7    9

         Q     L

         H     A     R     (    3     )

        1    5 .    2    3

         7 .    8     4

        5 .     7    0

         4 .    0     7

        3 .    5     4

        5

     .    5    1

         7 .    5    3

        1    1 .    0    6

        1    3 .     7    3

        3    6

     .    3    6

         Q     L

         H     A     R     (    3     )    +

         S     Q

        1    5 .    0     7

        8 .    0    1

        5 .     7    9

         4 .    0    3

        3 .     4     4

        5

     .     4     4

         7 .     4    2

        1    0 .    5    2

        1    2 .     4    2

        3    3

     .    1    6

         D    i     ff   e   r   e   n   c   e    i   n     Q     L

        0 .    1    6

      -    0 .    1    6

      -    0 .    0    9

        0 .    0     4

        0 .    0    9

        0

     .    0     7

        0 .    1    1

        0 .    5     4

        1 .    3    1

        3

     .    2    0

    27

  • 8/17/2019 Behaviorial Finance.pdf

    29/30

        T   a    b    l   e    6  -     C   o   n    t    i   n   u   e     d

        P   a   n   e    l

        B   :    O   u    t  -   o    f  -   s   a   m   p    l   e   p   r   e    d    i   c    t    i   o   n

         R   e   a     l    i   z   e     d     V   o

         l   a    t    i     l    i    t   y

         B   o    t    t   o   m

         T   o   p

        1     %

        5

         %

        1    0     %

        2    5     %

        5    0     %

        5    0     %

        2    5     %

        1    0     %

        5     %

        1     %

         M     S     E   :

         H     A     R

         (    3     )

        0 .    0    1    9

        0 .    0

        2     4

        0 .    0    2    2

        0 .    0    2    3

        0 .    0    2     7

        0 .    1    5    1

        0 .    2    5    5

        0 .    5    1    3

        0 .    8    6     7

        2 .     7    9    9

         M     S     E   :

         V     H     A

         R     (    3     )     (     R     V ,     S     Q     )

        0 .    0    2    2

        0 .    0

        2    8

        0 .    0    2    5

        0 .    0    2    5

        0 .    0    2    8

        0 .    1     4    2

        0 .    2    3    6

        0 .     4    6    8

        0 .     7    8    6

        2 .     4     4    1

         D    i     ff   e   r   e   n   c   e    i   n

         M     S     E

      -    0 .    0    0    3

      -    0 .    0

        0    3

      -    0 .    0    0    3

      -    0 .    0    0    2

      -    0 .    0    0    1

        0 .    0    0    9

        0 .    0    1    9

        0 .    0     4    5

        0 .    0    8    2

        0 .    3    5    8

         Q     L

         H     A     R

         (    3     )

         7 .    9    9

        6 .

        8    5

        5 .    3    6

         4 .    3    0

        3 .    5    8

        5 .    9    3

        8 .    2    0

        1    2 .    3    0

        1    8 .    1    3

        5    2 .    5    6

         Q     L

         V     H     A

         R     (    3     )     (     R     V ,     S     Q     )

        9 .    2    1

         7 .

         7    1

        6 .    0    0

         4 .    6    3

        3 .    6    6

        5 .     4    9

         7 .    5    9

        1    1 .    3    0

        1    6 .    5    0

         4    6 .    0    9

         D    i     ff   e   r   e   n   c   e    i   n

         Q     L

      -    1 .    2    2

      -    0 .

        8    5

      -    0 .    6     4

      -    0 .    3    3

      -    0 .    0     7

        0 .     4     4

        0 .    6    1

        1 .    0    1

        1 .    6    3

        6 .     4     7

    28

  • 8/17/2019 Behaviorial Finance.pdf

    30/30

    A Appendix

    Table A.1:Robustness of the out-of-sample forecasts: Accounting for the de-

    layed publication of search query dataThe table repeats the 1 day out-of-sample volatility forecast presented in Table   5   usingsearch query data of two lags, i.e. volatility tomorrow is predicted with volatility data untiltoday and search data until yesterday to account for delays in the publication of search querydata. Performance measures are the mean squared error (MSE,×104), the quasi-likelihoodloss function (QL,  ×102) and the  R2 (in percent) of the Mincer-Zarnowitz regression. Thepreferred model (minimum of QL loss function and MSE, maximum of   R2) is indicatedthrough bold numbers.

    Model: MSE QL   R2

    AR(1) RV 0.240 5.413 64.14VAR(1) RV, SQ 0.214 4.745 66.23

    AR(3) RV 0.210 4.527 67.80

    VAR(3) RV, SQ 0.206 4.397 67.73

    HAR(3) RV 0.198 4.324 69.07

    VHAR(3) RV, SQ   0.194 4.223 69.43

    29