42
Bagging Binary Predictors for Time Series Tae-Hwy Lee Department of Economics University of California, Riverside Riverside, CA 92521 [email protected] Yang Yang Department of Economics University of California, Riverside Riverside, CA 92521 [email protected] February 2004 Abstract Bootstrap aggregating or Bagging, introduced by Breiman (1996a), has been proved to be eective to improve on unstable forecast. Theoretical and empirical works using classication, regression trees, variable selection in linear and non-linear regression have shown that bagging can generate substantial prediction gain. However, most of the existing literature on bagging have been limited to the cross sectional circumstances with symmetric cost functions. In this paper, we extend the application of bagging to time series settings with asymmetric cost functions, particularly for predicting signs and quantiles. We link quantile predictions to binary predictions in a unied framwork. We nd that bagging may improve the accuracy of unstable predictions for time series data under certain conditions. Various bagging forecast combinations are used such as equal weighted and Bayesian Model Averaging (BMA) weighted combinations. For demonstration, we present results from Monte Carlo experiments and from empirical applications using monthly S&P500 and NASDAQ stock index returns. Key Words: Asymmetric cost function, Bagging, Binary prediction, BMA, Forecast combination, Ma- jority voting, Quantile prediction, Time Series. JEL Classication: C3, C5, G0. Corresponding author. Tel: +1 (909) 827-1509. Fax: +1 (909)787-5685.

Bagging Binary Predictors for Time Series · As noticed by many researchers bagging may outperform the unbagged predictors, but this may not always be the case. Bagging is a device

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Bagging Binary Predictors for Time Series · As noticed by many researchers bagging may outperform the unbagged predictors, but this may not always be the case. Bagging is a device

Bagging Binary Predictors for Time Series

Tae-Hwy Lee∗

Department of EconomicsUniversity of California, Riverside

Riverside, CA [email protected]

Yang YangDepartment of Economics

University of California, RiversideRiverside, CA 92521

[email protected]

February 2004

Abstract

Bootstrap aggregating or Bagging, introduced by Breiman (1996a), has been proved to be effectiveto improve on unstable forecast. Theoretical and empirical works using classification, regression trees,variable selection in linear and non-linear regression have shown that bagging can generate substantialprediction gain. However, most of the existing literature on bagging have been limited to the crosssectional circumstances with symmetric cost functions. In this paper, we extend the application ofbagging to time series settings with asymmetric cost functions, particularly for predicting signs andquantiles. We link quantile predictions to binary predictions in a unified framwork. We find that baggingmay improve the accuracy of unstable predictions for time series data under certain conditions. Variousbagging forecast combinations are used such as equal weighted and Bayesian Model Averaging (BMA)weighted combinations. For demonstration, we present results from Monte Carlo experiments and fromempirical applications using monthly S&P500 and NASDAQ stock index returns.

Key Words: Asymmetric cost function, Bagging, Binary prediction, BMA, Forecast combination, Ma-jority voting, Quantile prediction, Time Series.

JEL Classification: C3, C5, G0.

∗Corresponding author. Tel: +1 (909) 827-1509. Fax: +1 (909)787-5685.

Page 2: Bagging Binary Predictors for Time Series · As noticed by many researchers bagging may outperform the unbagged predictors, but this may not always be the case. Bagging is a device

1 Introduction

In practice it is quite common that one forecast model performs well in certain periods while other models

perform better in other periods. It is difficult to find a forecast model that outperforms all competing models.

To improve forecasts over individual models, combined forecasts have been suggested. Bates and Granger

(1969), Granger, Deutsch, and Terasvirta (1994), Stock and Watson (1999), Knox, Stock and Watson (2000)

and Yang (2004) show that forecast combinations can improve forecast accuracy over a single model. Granger

and Jeon (2004), Aiolfi and Favero (2002), and Aiolfi and Timmermann (2004) emphasize that combined

forecasts may provide an insurance to diversify risk of forecast errors, analogous to investing on portfolios

rather than on individual securities.

In practice it is also common that a forecast model performs well in certain periods while it does not do

very well in other periods. It is difficult to find a forecast model that performs well in all prediction periods.

To improve forecasts of a model, a combination can be formed over a set of training sets. While usually

we have a single training set, it can be replicated via bootstrap and the combined forecasts trained over

the bootstrap-replicated training sets can be better. This is the idea of bootstrap aggregating (or bagging),

introduced by Breiman (1996a).

It is well known that, while financial returns {Yt}may not be predictable, their variance and signs are oftenshown to be predictable. Christofferson and Diebold (2003) show that the binary variableGt+1 ≡ 1(Yt+1 > 0)is predictable when some conditional moments are time varying. Hong and Lee (2003), Hong and Chung

(2003), Pesaran and Timmermann (2002a), Skouras (2002), among many others, find some evidence that

the directions of stock returns and foreign exchange rate changes are predictable.

In this paper we examine how bagging may improve binary predictions and quantile predictions for

time series under asymmetric cost functions. While there remains much work to be done in the literature,

many theoretical and numerical works have shown that bagging is effective to improve unstable forecasts.

However, most of the existing works have focused on cross sectional data and symmetric cost functions.

In this paper we find that bagging can improve the accuracy of binary and quantile predictions for time

series data under asymmetric cost functions. Monte Carlo experiments and empirical applications using the

S&P500 and NASDAQ monthly returns are presented to demonstrate it. We construct binary predictors

based on quantile predictors so that the binary and quantile predictions are linked. Various bagging forecast

combinations with equal weights and weights based on Bayesian Model Averaging (BMA) are considered.

As noticed by many researchers bagging may outperform the unbagged predictors, but this may not

always be the case. Bagging is a device to improve the accuracy of unstable predictors. A predictor is said to

1

Page 3: Bagging Binary Predictors for Time Series · As noticed by many researchers bagging may outperform the unbagged predictors, but this may not always be the case. Bagging is a device

be unstable if perturbing the training sample can cause significant changes in the predictor (Breiman 1996b).

The unstableness of predictors may come from many sources: discrete choice regressions (e.g. decision

trees and classification problem), non-smooth target functions (e.g. median or quantile function involving

indicator functions), small sample size, outliers or extreme values, and etc. Under these circumstances, the

model uncertainty and/or parameter estimation uncertainty may be so serious that the standard regression

and prediction may be unstable.

Several methods have been suggested to smooth out the instabilities. Combination or aggregation smooth-

ing (Bates and Granger 1969, Granger and Jeon 2004) have been adopted frequently in moment predictions.

A kernel smoothing has been proposed for median predictions (Horowitz 1992) and binary quantile pre-

dictions (Kordas 2002). Bootstrap aggregating or bagging (Breiman 1996a) is a method of smoothing the

instabilities by averaging the predictors over bootstrap predictors and thus lowering the sensitivity of the

predictors to training samples.

Why does bagging work? Bagging constructs a predictor by averaging predictors trained on bootstrap

samples. Bagging is effective thanks to the variance reduction stemming from averaging predictors and so it

is most effective if the volatility of the predictors is very high. This mechanism of bagging for smoothing is

illustrated by Friedman and Hall (2000), who decompose a predictor or an estimator into linear and higher

order parts by Taylor-expansion. They show that bagging reduces the variance for the higher order nonlinear

component by replacing it with an estimate of its expected value, while leaving the linear part unaffected.

Therefore, bagging works by moving the estimator towards its linear components and works most successfully

with highly nonlinear estimators such as decision trees and neural network. Buja and Stuetzle (2002) extend

the application of bagging regression to more general circumstance by expressing predictors as statistical

functionals (especially those can be written as U statistics). They use the von Mises expansion technique

based on Taylor expansion and prove that the leading effects of bagging on variance, squared bias, and mean

squared error (MSE) are of order n−2, where n is the estimation sample size. So, bagging can also be used on

estimators from non-smooth target functions, such as median predictors and quantile predictors. However,

the mechanism of bagging has not yet been fully understood. Grandvalet (2004) shows that bagging stabilizes

prediction by equalizing the influence of training samples, even when the usual variance reduction argument

is at best questionable. Bagging uses bootstrap sample randomly drawn with replacement and some extreme

values and outliers may not be included in some bootstrap samples and thus reduces the influence of such

values. See also Rao and Tibshirani (1997).

Alternatively, Buhlmann and Yu (2002) have shown that bagging transforms a hard-thresholding function

2

Page 4: Bagging Binary Predictors for Time Series · As noticed by many researchers bagging may outperform the unbagged predictors, but this may not always be the case. Bagging is a device

(e.g., indicator) into a soft-thresholding function (smooth function) and thus decreases the instabilities of

predictors. However, the relevance of this argument depends on how the bagging predictor is formed. As

discussed in Breimen (1996a), bagging predictors can be formed via voting instead of averaging. If the

target variable of interest is numerical (as for the quantile prediction in this paper), we can form the bagging

predictor by an average over bootstrap predictors. However, if the target variable takes only discrete values,

then a voting scheme over the discrete values is to be called for from the bootstrap predictors (as for

the binary prediction in this paper). Both averaging and voting can be weighted. Bagging transforms

a hard-thresholding function into a soft-thresholding function only for averaged-bagging, but not for the

voted-bagging because a voted-bagging predictor will remain as a binary hard-thresholding function.

Even so, the ability of the voted-bagging to stabilize classifier prediction has been proved to be very

successful in the machine learning literature (Bauer and Kohavi 1999, Kuncheva and Whitaker 2003, and

Evgeniou et al. 2004). The method for voting classification or voted-bagging is now an established research

area known under different names in the literature — combining classifiers, classifier ensembles, committees

of learners, a team of classifiers, consensus of learners, mixture of experts, etc. In this paper we attempt to

understand how the voted-bagging (bagging binary predictions) may work or fail to work. We find that the

voted-bagging works very well when the training sample is small and the predictor is unstable. It does not

necessarily mean that a better prediction can be achieved by stabilization of the prediction as discussed in

different but related context by Kuncheva and Whitaker (2003) and Dzeroski and Zenko (2004).

There are some limitations in the already substantial bagging literature. Most of the existing literature

on bagging are dealing with independent data. With the obvious dynamic structures in most economics and

finance variables, it is thus important to see how bagging works for time series. One concern of applying

bagging to time series is whether a bootstrap can provide a sound simulation sample. Fortunately, it has

been shown that the block bootstrap can provide consistent densities for moment estimators and quantile

estimators (Hahn 1997, Fitzenberger 1997, Horowitz 1998, Hardle et al. 2003, Politis 2003). Another

limitation of current bagging research is using symmetric cost functions (such as MSE) as a prediction

evaluation criterion. However, symmetric cost may not be a proper way to compare different predictors

(Granger 1999a, 1999b, 2002, Granger and Pesaran 2000), and it is widely accepted that asymmetric cost

functions are more relevant (Elliott et al. 2003a, 2003b). That is, our utility changes differently with positive

and negative forecast error, or with wrong-alert and fail-to-alert. In this paper, we analyze bagging for non-

smooth binary predictors formed via majority voting, for time series, under asymmetric cost functions.

The plan of this paper is as follows. Section 2 shows a binary prediction problem based on utility

3

Page 5: Bagging Binary Predictors for Time Series · As noticed by many researchers bagging may outperform the unbagged predictors, but this may not always be the case. Bagging is a device

maximization behavior of an economic agent. We introduce a way to form a binary predictor through a

quantile predictor. Section 3 shows how bagging provides superior predictive ability for time series binary

predictions, that is constructed based on quantile predictions. In Section 4 we examine how bagging works

for quantile predictions. Section 5 presents Monte Carlo experiments and Section 6 reports empirical results

on the binary and quantile predictions using the monthly returns in S&P500 and NASDAQ stock indexes.

Section 7 concludes.

2 Binary Prediction

Granger (2002) notes, based on Luce (1980, 1981), that a risk measure of financial return Yt+1 that has a

conditional predictive distribution PYt+1(y|Xt) = Pr(Yt+1 < y|Xt) may be written as

R(Xt) = A1

Z ∞0

|y −m|pdPYt+1(y|Xt) +A2

Z 0

−∞|y −m|pdPYt+1(y|Xt),

with A1, A2 both > 0 and some p > 0, where m is the predictor of y that minimizes the risk. Granger (2002)

emphasizes the risk measure starts with the whole conditional distribution PYt+1(y|Xt) with asymmetric

weights on each half-distribution.

Initially considering the symmetric case A1 = A2, one has the class of volatility measures

R(Xt) = E|y −m|p,

which includes the variance (p = 2) and the mean absolute deviation (p = 1). One problem raised here is

how to choose optimal Lp-norm in empirical works. In some studies (e.g. Ding, Granger, and Engle 1993,

Granger, Ding, and Spear 2000) the time series and distributional properties of these measures have been

studied empirically and the absolute deviations (p = 1) found to have particular properties such as long

memory. Granger (2002) consider the absolute return (p = 1) as a preferable measure given that returns

from the stock market are known to come from a distribution with particularly long tails. In particular,

Granger (2002) refers to a trio of papers (Nyguist 1983, Money et al. 1982, Harter 1977) who find that the

optimal p = 1 from Laplace and Cauchy distribution, p = 2 for Gaussian and p = ∞ (min/max estimator)

for a rectangular distribution. Granger (2002) also notes that in terms of the kurtosis κ, Harter (1977)

suggests to use p = 1 for κ > 3.8; p = 2 for 2.2 ≤ κ ≤ 3.8; and p = 3 for κ < 2.2. In finance, the kurtosis ofreturns can be thought of as being well over 4 and so the L1 norm with p = 1 is preferred. In this paper, we

follow Granger (2002) to consider binary and quantile predictions with A1 6= A2 and p = 1.Consider the conditional expected cost or conditional risk

R(Xt) = EY (c (et,1) |Xt) ≡ZRc (et,1) dPYt+1(y|Xt), (1)

4

Page 6: Bagging Binary Predictors for Time Series · As noticed by many researchers bagging may outperform the unbagged predictors, but this may not always be the case. Bagging is a device

where et,1 ≡ Gt+1 − Gt,1, Gt+1 ≡ 1(Yt+1 > 0), Gt,1 ≡ Gt,1(Xt) is the one-step binary predictor at t. The

unconditional risk is then

EY,X (c (et,1)) ≡ZRk

ZRc (et,1) dPYt+1(y|Xt)dPXt(x), (2)

where the joint distribution PYt+1,Xt(y,x) = PYt+1(y|Xt)PXt(x) and PXt(x) is the marginal distribution of

Xt.

We now consider the above risk function to discuss a binary prediction. To define the asymmetric risk

with A1 6= A2 and p = 1, we consider the binary decision problem of Granger and Pesaran (2000) with the

following 2× 2 payoff or utility matrix:Utility Gt+1 = 1 Gt+1 = 0Gt,1 = 1 u11 u01Gt,1 = 0 u10 u00

(3)

where uij is the utility when Gt,1 = j is predicted and Gt+1 = i is realized (i, j = 1, 2). Assume u11 > u10

and u00 > u01, and uij are constant over time. (u11−u10) > 0 is the utility gain from taking correct forecastwhen Gt,1 = 1, and (u00 − u01) > 0 is the utility gain from taking correct forecast when Gt,1 = 0. Denote

πt ≡ Pr(Gt+1 = 1|Xt). (4)

Under this set up, the expected utility of Gt,1 = 1 is u11πt+u01(1−πt), and the expected utility of Gt,1 = 0is u10πt + u00(1− πt). Hence, to maximize utility, the prediction Gt,1 = 1 will be made if

u11πt + u01(1− πt) > u10πt + u00(1− πt),

or

πt >(u00 − u01)

(u11 − u10) + (u00 − u01) ≡ 1− α.

Summarizing, we have the following result.

Proposition 1 (Granger and Pesaran, 2000). The optimal predictor that can maximize expected utility of

an economic agent is:

G†t,1 = 1(πt > 1− α). (5)

We can consider (1− α) as the utility-gain ratio from correct prediction when Gt+1 = 0, or equivalently

the opportunity cost (in the sense that you lose the gain) of a wrong-alert if the wrong prediction is made

when Gt+1 = 0. Similarly,

α ≡ (u11 − u10)(u11 − u10) + (u00 − u01) (6)

5

Page 7: Bagging Binary Predictors for Time Series · As noticed by many researchers bagging may outperform the unbagged predictors, but this may not always be the case. Bagging is a device

can be treated as the utility-gain ratio from correct prediction when Gt+1 = 1 is realized. To put it in

another way, α is the opportunity cost of a fail-to-alert from the wrong prediction when Gt+1 = 1 is realized.

We thus have the cost function c (et,1) = c (Gt+1 −Gt,1):

Cost Gt+1 = 1 Gt+1 = 0Gt,1 = 1 0 1− αGt,1 = 0 α 0

That is

c (et,1) =

α if1− α if0 if

et,1 = 1et,1 = −1et,1 = 0

,

which can be equivalently written as c (et,1) = ρα(et,1), where

ρα(z) ≡ [α− 1(z < 0)]z (7)

is the check function of Koenker and Basset (1978). Hence, summarizing, we have obtained the following

the result.

Proposition 2. The solution to the utility maximization problem in (3) is the same as the solution to the

cost minimization problem in (7). In other words, the optimal binary prediction obtained in Proposition 1

is the binary quantile that minimizes EY (ρα(et,1)|Xt), i.e., G†t,1 ≡ argminGt,1 EY (ρα(Gt+1 −Gt,1)|Xt), and

thus G†t,1 is the conditional α-quantile of Gt+1, denoted as

G†t,1 = QGt+1(α|Xt). (8)

Proof: The cost function may be written as

c (et,1) = αGt+1 + (1− α−Gt+1)Gt,1. (9)

and thus the conditional risk is

EY (c (et,1) |Xt) = απt + (1− α− πt)Gt,1. (10)

When πt > 1 − α, the minimizer of EY (c (et,1) |Xt) is G†t,1 = 1. When πt < 1 − α, the minimizer of

EY (c (et,1) |Xt) is G†t,1 = 0.

This is the maximum score problem of Manski (1975, 1985), Manski and Thompson (1989), and Kordas

(2002). Proposition 2 shows that the optimal binary predictor is the conditional α-quantile of Gt+1. In the

6

Page 8: Bagging Binary Predictors for Time Series · As noticed by many researchers bagging may outperform the unbagged predictors, but this may not always be the case. Bagging is a device

next proposition, we show that the optimal binary predictor is also the binary function of the conditional

α-quantile of Yt+1.

Proposition 3. The optimal binary predictor (under our cost function) is the indicator function of the

α-quantile of Yt+1 being positive:

G†t,1 = 1(QYt+1(α|Xt) > 0), (11)

where QYt+1(α|Xt) is the α-quantile function of Yt+1 conditional on Xt.

Proof: As noted by Powell (1986), for any monotonic function h(·), Qh(Y )(α|X) = h(QY (α|X)), whichfollows immediately from observing that Pr(Y < y|X) = Pr[h(Y ) < h(y)|X]. As the indicator function ismonotonic, QGt+1(α|Xt) = Q1(Yt+1>0)(α|Xt) = 1(QYt+1(α|Xt) > 0).

Remarks. Note that Q†Gt+1(α|Xt) = argminEY (ρα(et,1)|Xt) with et,1 ≡ Gt+1 − QGt+1(α|Xt), while

Q†Yt+1(α|Xt) = argminEY (ρα(ut,1)|Xt) with ut,1 ≡ Yt+1 − QYt+1(α|Xt). Proposition 2 indicates that the

binary prediction can be made from the binary quantile regression for Gt+1 in equation (8). Proposition 3

indicates that the binary prediction can also be made from a binary function of the quantile for Yt+1 in equa-

tion (11). In the next section where we consider bagging binary predictors, we use the latter. The reasons

why we choose to use Proposition 3 instead of Proposition 2 are as follows. First, Manski (1975, 1985) and

Kim and Pollard (1990) show that parameter estimators obtained from minimizingP

ρα (et,1) has a slower

n1/3 convergence rate and has a non-normal limiting distribution. This is unattractive for our subsequent

analysis for bagging in Section 4 where we will make assumptions of asymptotic normality and bootstrap

consistency. Instead, we estimate the binary regression model from minimizingP

ρα (ut,1) , which gives n1/2

consistency and asymptotic normality. See Kim and White (2001), Chernozhukov and Umantsev (2001),

and Komunjer (2003). Second, the asymptotic properties of quantile estimators are better established than

the maximum score estimator. The properties of quantile regression for both linear and non-linear models,

for both cross-section and time-series data have been well developed. However, only linear models have

been considered inside of the indicator function in the binary or maximum score literature. Third, perhaps

most importantly, the quantile-based binary regression models allows more structures than the maximum

score regression models. We may have much more information than just an indicator. In the maximum

score literature, Y is usually latent and unobservable. In our case, however, Y is not latent but observable.

Hence using more informative variable Y in the quantile regression tends to give better prediction than just

using the binary variable G. Finally, the quantile itself is a very interesting topic, particularly due to recent

7

Page 9: Bagging Binary Predictors for Time Series · As noticed by many researchers bagging may outperform the unbagged predictors, but this may not always be the case. Bagging is a device

booming industry of the financial risk management. The quantile prediction is not only used in generating

the binary prediction as shown in Proposition 3, but also the quantile itself is often the objective of inter-

ests in finance and economics, e.g., Value-at-Risk (VaR). For this reason, in addition to the bagging binary

predictors considered in Section 3, we also consider the bagging quantile predictors in Section 4.

3 Bagging Binary Prediction

To compute the binary predictor via Proposition 3, we consider a linear quantile regression function

QYt+1(α|Xt) = X0tβ(α), (12)

where Xt ∈ Rl contains the known function (e.g. polynomials) of Xt ∈ Rk and β(α) ∈ Θ ⊂ Rl. We

use a polynomial function (Chernozhukov and Umantsev, 2001) within a univariate model, with Xt = Yt,

Xt = (1 Yt Y2t )

0, β(α) = [β0(α) β1(α) β2(α)]0, and QYt+1(α|Xt) = β0(α) + β1(α)Yt + β2(α)Y

2t . Denote the

optimal binary predictor as

G†t,1(Xt) ≡ 1(Q†Yt+1(α|Xt) > 0) = 1(X0tβ†(α) > 0),

where β†(α) ≡ argminβ(α) EY (ρα (ut,1) |Xt).

We estimate β†(α) recursively using the “rolling” training sample. Suppose there are T + 1 (≡ R + P )observations. We use the most recent R observations available at time t, R ≤ t < T +1, as a training sampleDt ≡ {(Ys,Xs−1)}ts=t−R+1.We then generate P one-step-ahead forecasts for the remaining validation sample.For each time t in the prediction period, we use a rolling training sample Dt of size R to estimate model

parameters

βt(α|Dt) ≡ argminβ(α)

tXs=t−R+1

ρα(Ys −X0s−1β(α)), t = R, . . . , T.

We can then generate a sequence of one-step-ahead forecasts {QYt+1(α|Xt,Dt) = X0tβt(α|Dt)}Tt=R and

Gt,1(Xt,Dt) ≡ 1(QYt+1(α|Xt,Dt) > 0) = 1(X0tβt(α|Dt) > 0), t = R, . . . , T. (13)

For more discussion on the rolling estimation scheme and other alternative schemes, see West and McCracken

(1998, p. 819). Note that the notation, Gt,1(Xt,Dt), is to indicate the parameter estimation uncertainty inβt(α|Dt) due to the training of the unknown parameter β(α) using the training sample Dt.Although we have a single training set Dt at time t, an imitation of the training set can be constructed

from repeated bootstrap samples D∗t and form the predictor of the target variable (that is usually one-

dimensional). If the target variable is numerical as for the stock returns Y or for its quantiles QY (α|X),

8

Page 10: Bagging Binary Predictors for Time Series · As noticed by many researchers bagging may outperform the unbagged predictors, but this may not always be the case. Bagging is a device

the bagging procedure is to take an average of the forecasts from the replicated training samples. If the

target variable is a class as for the binary variable G, then a method of aggregating the bootstrap trained

forecasts is a voting. One of the most representative unstable predictor used in bagging literature is the

classifier predictor, with the binary forecast as a special case with only two classes. The aim of the paper

is to compare the two binary forecasts: unbagged predictor Gt,1 and the bagging predictor GBt,1 (defined

below).

The procedure of bagging for binary predictors can be conducted in the following steps:

1. Given a training set of data at time t, Dt ≡ {(Ys,Xs−1)}ts=t−R+1, where the Ys are the variable ofinterest and the Xs are vectors of predictor variables, construct the jth bootstrap sample D∗(j)t ≡{(Y ∗(j)s ,X

∗(j)s−1)}ts=t−R+1, j = 1, . . . , J, the bootstrap samples, according to the empirical distribution

of Dt.

2. Estimate β∗(j)t (α|D∗(j)t ) ≡ argminβ(α)

Pts=t−R+1 ρα(Y

∗(j)s −X∗(j)0s−1 β(α)), t = R, . . . , T.

3. Compute the bootstrap binary predictor from the jth bootstrapped sample, that is,

G∗(j)t,1 (Xt,D∗(j)t ) ≡ 1(Q∗(j)Yt+1

(α|Xt,D∗(j)t ) > 0) = 1(X0tβ∗(j)t (α|D∗(j)t ) > 0). (14)

Note that here we use Xt instead of X∗(j)t so that only the parameter β

∗(j)t (α|D∗(j)t ) is trained on the

bootstrap samples D∗(j)t , but the forecast is formed using the original predictor variables Xt.

4. Form a (weighted) average over all bootstrap training samples drawn from the same distribution:

ED∗G∗t,1(Xt,D∗t ) =JXj=1

wj,tG∗(j)t,1 (Xt,D∗(j)t )

withPJj=1 wj,t = 1. The expectation ED∗(·) is the average over the bootstrap training samples D∗t .

This is a key step in bagging to smooth out the instability of the predictor due to the parameter

estimation (training or learning). The weights wj,t is typically set equal to J−1, but can be computed

via a Bayesian approach (see Section 3.2). J is often chosen in the range of 50, depending on sample

size and on the computational cost to evaluate the predictor. See Breiman (1996a, Section 6.2). Our

Monte Carlo results and empirical results reported in Sections 5 and 6 suggests J = 50 is more than

sufficient, and even J = 20 is often good enough.

5. Finally, the bagging binary predictor GBt,1(Xt,D∗t ) can be constructed by the majority voting over theJ bootstrap predictors, i.e.,

GBt,1(Xt,D∗t ) ≡ 1(ED∗G∗t,1(Xt,D∗t ) > 1/2).

9

Page 11: Bagging Binary Predictors for Time Series · As noticed by many researchers bagging may outperform the unbagged predictors, but this may not always be the case. Bagging is a device

The notation, G∗(j)t,1 (Xt,D∗(j)t ), G∗t,1(Xt,D∗t ), and GBt,1(Xt,D∗t ), is to indicate the parameter estimation

uncertainty in β∗(j)t (α|D∗(j)t ) due to the training of the unknown parameter β(α) using the bootstrap training

sample D∗(j)t . For example, the binary predictor G∗(j)t,1 (Xt,D∗(j)t ) that is constructed using the parameter

value β∗(j)t (α|D∗(j)t ) trained by the sample D∗(j)t is the output of G†t,1(Xt) for each value ofXt in the predictor

space. Bagging is to average out (smooth out) the effect of the different training samples.

3.1 How bagging works

We now examine how bagging provides a superior predictor than an unbagged predictor for time series data.

We compare the unconditional risk of a bagging predictor and an unbagged predictor.

First, we need to find out the differences among different binary predictors. The minimum possible

expected cost is that of the optimal predictor G†t,1(Xt). Plugging G†t,1(Xt) into the conditional risk function

in (10), we have

EY (c(e†t,1)|Xt) = απt + (1− α− πt)G†t,1(Xt), (15)

where e†t,1 ≡ Gt+1 − G†t,1(Xt). As defined in equation (1), the conditional expectation EY (·|Xt) is taken

over PYt+1(y|Xt), the distribution conditional on the predictor variables Xt.

Once the unknown parameters are trained, the conditional risks of unbagged predictor and bagging

predictors can be written as

EY (c (et,1) |Xt) = απt + (1− α− πt)Gt,1(Xt,Dt), (16)

EY (c(eBt,1)|Xt) = απt + (1− α− πt)GBt,1(Xt,D∗t ), (17)

where et,1 ≡ Gt+1 − Gt,1(Xt,Dt) and eBt,1 ≡ Gt+1 − GBt,1(Xt,D∗t ).Now, taking (weighted) averages of (16) and (17) over the training samples Dt and D∗t , respectively,

yields:

EY,D(c (et,1) |Xt) = απt + (1− α− πt)EDGt,1(Xt,Dt)

= απt + (1− α− πt) Pr(Gt,1(Xt,Dt) = 1), (18)

and

EY,D∗(c(eBt,1)|Xt) = απt + (1− α− πt)ED∗GBt,1(Xt,D∗t )

= απt + (1− α− πt) Pr(GBt,1(Xt,D∗t ) = 1). (19)

Then, from (15), (18), and (19), it is easy to see the following result.

10

Page 12: Bagging Binary Predictors for Time Series · As noticed by many researchers bagging may outperform the unbagged predictors, but this may not always be the case. Bagging is a device

Proposition 4. If the conditional risk of the unbagged predictor (18) is so close to the minimum possible

conditional risk in (15), that is, EDGt,1(Xt,Dt) or Pr(Gt,1(Xt,Dt) = 1) is very close to G†t,1, then there isno room for improvement by bagging.

Comparing the predictability of unbagged predictor and bagging predictor can be done by comparing the

conditional risks in (18) and (19). If

(1− α− πt) Pr(Gt,1(Xt,Dt) = 1) > (1− α− πt) Pr(GBt,1(Xt,D∗t ) = 1), (20)

we say bagging works. Hence, we need to explore the properties of Pr(GBt,1(Xt,D∗t ) = 1) and Pr(Gt,1(Xt,Dt) =1). Recall Proposition 3 that we utilize quantile predictors to generate the binary predictors, i.e., G†t,1(Xt) =

1(Q†Yt+1(α|Xt) > 0), Gt,1(Xt,Dt) = 1(QYt+1(α|Xt,Dt) > 0), and G∗t,1(Xt,D∗t ) = 1(Q∗Yt+1(α|Xt,D∗t ) > 0).Let us further define some notation

ZR ≡ R1/2[QYt+1(α|Xt,Dt)−Q†Yt+1(α|Xt)]/σQ(α) ≡ −dR + d†R,

Z∗R ≡ R1/2[Q∗Yt+1(α|Xt,D∗t )− QYt+1(α|Xt,Dt)]/σQ(α) ≡ −d∗R + dR,

with d†R = −√RQ†Yt+1(α|Xt)/σQ(α), d

∗R ≡ −

√RQ∗Yt+1(α|Xt,D∗t )/σQ(α), dR ≡ −

√RQYt+1(α|Xt,Dt)/σQ(α),

and σ2Q(α) = var³R1/2[QYt+1(α|Xt,Dt) − Q†Yt+1(α|Xt)]

´.

Using Propositions 1 and 4, the condition in (20) under which bagging works can be written as follows:

Proposition 5. (a) When 1 − α − πt > 0 (equivalently when G†t,1 = 0, Q†Yt+1(α|Xt) < 0, and d†R > 0),

the bagging works if Pr(Gt,1(Xt,Dt) = 1) > Pr(GBt,1(Xt,D∗t ) = 1) > G†t,1 = 0. (b) When 1 − α − πt < 0

(equivalently when G†t,1 = 1, Q†Yt+1(α|Xt) > 0, and d†R < 0), the bagging works if Pr(Gt,1(Xt,Dt) = 1) <

Pr(GBt,1(Xt,D∗t ) = 1) < G†t,1 = 1.

We assume that {QYt+1(α|Xt,Dt) = X0tβt(α|Dt)} satisfies the following asymptotic properties — the

asymptotic normality (Komunjer 2003) and the bootstrap consistency (Fitzenberger 1997):

ZR →d N(0, 1),

supz∈R

|| Pr(Z∗R < z)− Φ(z) ||→p 0,

as R→∞ and 0 < σ2Q(α) <∞, where Φ(·) is the standard normal CDF. Therefore,

Pr(Gt,1(Xt,Dt) = 1) = Pr(QYt+1(α|Xt,Dt) > 0)

11

Page 13: Bagging Binary Predictors for Time Series · As noticed by many researchers bagging may outperform the unbagged predictors, but this may not always be the case. Bagging is a device

= Pr(dR < 0)

= Pr(ZR > d†R)

= 1− Pr(ZR < d†R) (21)

→ 1− Φ(d†R) as R→∞, (22)

where the second equality follows from the definition of dR, and the third equality follows from the definition

of ZR. As noted in Proposition 4, if Pr(Gt,1(Xt,Dt) = 1) is very close to G†t,1, there is no room for improve-

ment by bagging. According to (22), when d†R is close to 0, Φ(d†R) is most away from 1 or 0. That means

when the pseudo-true conditional α-quantile Q†Yt+1(α|Xt) is close to 0, there is largest room for bagging to

work. This is because when Q†Yt+1(α|Xt) = 0 the unbagged predictor Gt,1(Xt,Dt) = 1(QYt+1(α|Xt,Dt) > 0)will be most “unstable”. Hence, for economic agents or financial investors with the cost function parameter

α that gives Q†Yt+1(α|Xt) = 0, the bagging may work better. Recall that (1 − α) is the utility-gain ratio

from correct prediction when Gt+1 = 0 or equivalently the opportunity cost of a wrong-alert when Gt+1 = 0.

Also, α is the utility-gain ratio from correct prediction when Gt+1 = 1 is realized or the opportunity cost of

a fail-to-alert from the wrong prediction when Gt+1 = 1 is realized. For those investors with the conditional

α-quantile of the stock returns equal zero, the bagging binary prediction could work most effectively.

Now let us see how bagging works when there is some room for the unbagged predictors to be improved.

Bagging prediction is based on bootstrap procedure, so

ED∗G∗t,1(Xt,D∗t ) = Pr(G∗t,1(Xt,D∗t ) = 1)

= Pr(Q∗Yt+1(α|Xt,D∗t ) > 0)

= Pr(Z∗R > dR)

≡ 1− Φ∗(dR)

→ 1− Φ(d†R) as R→∞,

where the fourth line defines a notation Φ∗(·) to denote the bootstrap distribution of Z∗R. Therefore, we canrewrite Pr(GBt,1(Xt,D∗t ) = 1) as

Pr(GBt,1(Xt,D∗t ) = 1) = Pr³ED∗G∗t,1(Xt,D∗t ) > 1/2

´= Pr

³1− Φ∗(dR) > 1/2

´= Pr

³Φ∗(dR) < 1/2

´= Pr

³dR < Φ

∗−1(1/2)´

12

Page 14: Bagging Binary Predictors for Time Series · As noticed by many researchers bagging may outperform the unbagged predictors, but this may not always be the case. Bagging is a device

= Pr³d†R − ZR < Φ∗−1(1/2)

´= Pr

³ZR > d

†R − Φ∗−1(1/2)

´= 1− Pr

³ZR < d

†R − Φ∗−1(1/2)

´(23)

→ 1− Φ(d†R) as R→∞, (24)

where the last line is due to the asymptotic normality and the bootstrap consistency, i.e., Φ∗−1(1/2) →Φ−1(1/2) = 0. Hence, when R is large, the bagging would not work because (22) and (24) are the same

in the limit. When R is small, a difference between (21) and (23) is due to the median of the bootstrap

distribution Φ∗−1(1/2). So, bagging works either (a) if Q†Yt+1(α|Xt) < 0 and Φ∗−1(1/2) < 0, or (b) if

Q†Yt+1(α|Xt) > 0 and Φ∗−1(1/2) > 0. Summarizing:

Proposition 6. (a) When 1 − α − πt > 0 (equivalently when G†t,1 = 0, Q†Yt+1(α|Xt) < 0, d†R > 0) and

R is small, the bagging works if Φ∗−1(1/2) < 0. (b) When 1 − α − πt < 0 (equivalently when G†t,1 = 1,

Q†Yt+1(α|Xt) > 0, d†R < 0) and R is small, the bagging works if Φ

∗−1(1/2) > 0.

Proposition 7. When R is large, bagging does not work for the binary prediction.

Remark. The ability of the voted-bagging to stabilize classifier prediction has been proved to be very

successful in the machine learning literature (Bauer and Kohavi 1999, Kuncheva and Whitaker 2003, and

Evgeniou et al. 2004). However, it may not work, as also discussed by Kuncheva and Whitaker (2003)

and Dzeroski and Zenko (2004) in different but related contexts. Bagging predictors could be worse than

unbagged predictors. From the above analysis, we may see that bagging can push a predictor to its optimal

value under certain conditions, and also see that ability of bagging to improve predictability is limited. If

there is no predictability at all or the unbagged predictor is already very close to the optimal predictor, there

will be no room for improvement by bagging. We also note that bagging can be worse than the unbagged

predictors. Therefore, we cannot arbitrarily select a prediction model and expect bagging to work like a

magic and give a better prediction automatically. In case of classification (as in our binary predictors), the

bagging predictor is formed via voting, for which case Breiman (1996a, Section 4.2) shows that it is possible

that bagging could be worse than unbagged predictor. As we will see from the Monte Carlo experiment in

Section 5 and the empirical applications in Section 6, the bagging could be worse for the binary predictors

and the quantile predictions, although in general it works well particularly with a small R.

Remark. The “voted-bagging” does not transform the hard-thresholding decision into a soft-thresholding

13

Page 15: Bagging Binary Predictors for Time Series · As noticed by many researchers bagging may outperform the unbagged predictors, but this may not always be the case. Bagging is a device

decision. As Buhlmann and Yu (2002) have shown, “averaged-bagging” transforms a hard-thresholding func-

tion (e.g., indicator) into a soft-thresholding function (smooth function) and thus decreases the instabilities

of predictors. In the majority voting GBt,1(Xt,D∗t ) = 1(ED∗G∗t,1(Xt,D∗t ) > 1/2) the bagging predictor is stilla hard thresholding decision, or the indicator decision — e.g., the decision whether to take an umbrella (not

a fraction of an umbrella) by forecasting weather. Hence, the bagging binary prediction remains unstable

(particularly around the thresholding values). Hence, the explanation of Buhlmann and Yu (2002) does not

apply to the voted-bagging.

3.2 BMA-weighted bagging for binary prediction

To form the bagging binary predictor via a majority voting we need to find a proper weight function {wj,t}to calculate ED∗G∗t,1(Xt,D∗t ) =

PJj=1 wj,tG

∗(j)t,1 (Xt,D∗(j)t ). Usually, the bagging predictor uses the equally

weighted averaging (wj,t = J−1, j = 1, . . . , J.) over the bootstrapped predictions. However, wj,t can be

estimated depending on the performance of each bootstrapped predictor. There are several candidates that

we can borrow from estimated weighting forecast combination. One method is to estimate the weight by

regression (minimizing the cost function) as initially suggested by Granger et al. (1994). This weight scheme

assumes that predictors to be combined are independent and the number of predictors are small compared to

the sample size, so the regular regression method can be applied. However, the number of our bootstrapped

predictors are large and the predictors are closely related, the regression weight is not applicable here.

Another method for forecast combination is via the Bayesian averaging, which is what we use in this paper

for Monte Carlo experiments and empirical work. The idea is to use the Bayesian model averaging (BMA)

technique for bagging.1

The posterior predictive density of D∗t based on Dt can be decomposed as two parts — the predictionbased on the estimated model and the parameter estimation given the data:

P (D∗t |Dt) =JXj=1

PhD∗t | β

∗(j)t (α|D∗(j)t ),Dt

iPhβ∗(j)t (α|D∗(j)t ) | D∗(j)t ,Dt

iP (D∗(j)t |Dt)

=JXj=1

PhD∗t | β

∗(j)t (α|D∗(j)t ),Dt

iPhβ∗(j)t (α|D∗(j)t ) | Dt

i.

We then compute the BMA-weighted bootstrap average ED∗G∗t,1(Xt,D∗t ) as follows:

ED∗G∗t,1(Xt,D∗t ) =

ZG∗t,1(Xt,D∗t )dP (D∗t |Dt)

1BMA has been proved to be useful in forecasting financial returns (Avramov 2002) and in macroeconomic forecasting forinflation and output growth (Garratt et al. 2003), to deal with the parameter and model uncertainties. The out-of-sampleforecast performance is often shown to be superior to that of model selection criteria.

14

Page 16: Bagging Binary Predictors for Time Series · As noticed by many researchers bagging may outperform the unbagged predictors, but this may not always be the case. Bagging is a device

=JXj=1

Phβ∗(j)t (α|D∗(j)t ) | Dt

i ZG∗t,1(Xt,D∗t )dP

hD∗t | β

∗(j)t (α|D∗(j)t ),Dt

i

≡JXj=1

wj,tG∗(j)t,1 (Xt,D∗(j)t ),

where the last equality follows fromRG∗t,1(Xt,D∗t )dP

hD∗t | β

∗(j)t (α|D∗(j)t ),Dt

i= G

∗(j)t,1 (Xt,D∗(j)t ) and by

setting wj,t ≡ Phβ∗(j)t (α|D∗(j)t ) | Dt

i.

The posterior probability of β∗(j)t (α|D∗(j)t ) given Dt through the jth bootstrap data set D∗(j)t is calculated

by Bayes’ rule:

wj,t = Phβ∗(j)t (α|D∗(j)t ) | Dt

i=

PhDt | β∗(j)t (α|D∗(j)t )

iP (β

∗(j)t (α|D∗(j)t ))PJ

j=1 PhDt | β∗(j)t (α|D∗(j)t )

iP (β

∗(j)t (α|D∗(j)t ))

,

and now the problem is to estimate PhDt | β∗(j)t (α|D∗(j)t )

iand P (β

∗(j)t (α|D∗(j)t )).When we do not have any

information for the prior P (β∗(j)t (α|D∗(j)t )), we may just use some non-informative prior, P (β

∗(j)t (α|D∗(j)t ))

is the same for all j, so that

wj,t = Phβ∗(j)t (α|D∗(j)t ) | Dt

i=

PhDt | β∗(j)t (α|D∗(j)t )

iPJ

j=1 PhDt | β∗(j)t (α|D∗(j)t )

i . (25)

where PhDt | β∗(j)t (α|D∗(j)t )

iis usually estimated by a likelihood function. According to Komunjer (2003),

the exponential of quantile cost function can be treated as quasi-likelihood function, so PhDt | β∗(j)t (α|D∗(j)t )

ican be calculated by

PhDt | β∗(j)t (α|D∗(j)t )

i∝ exp

Ã−

kXi=1

c(e∗(j)t−i,1)

!, (26)

with using the k-most recent in-sample observations, where e∗(j)t,1 = Gt+1 − G∗(j)t,1 (Xt,D∗(j)t ). We select

k = 1, 5, and R in the simulations and empirical experiments. Intuitively, wj,t gives a large weight to the jth

bootstrap predictor at period t when it has forecasted well over the past k periods, and gives a small weight

to the jth bootstrap predictor at period t when it forecasted poorly over the past k periods.2 In Tables

1-10 (to be discussed in Section 5), EW and Wk denote the weighted-bagging with equal weights and with

BMA-weights using k-most recent in-sample observations.

4 Bagging Quantile Prediction

Another unstable predictor used in bagging literature is non-linear predictors. Quantile predictor is the min-

imizer of cost function ρα(·) as we defined in (7), and is a non-linear function of sample moments. According2The weighting scheme proposed in Yang (2004) and used in Zou and Yang (2004) may be regarded as a special case of the

above BMA-weighted forecasts. Zou and Yang (2004) have applied this method to choose ARIMA models and find that it hasa clear stability advantage in forecasting over some existing popular model selection criteria.

15

Page 17: Bagging Binary Predictors for Time Series · As noticed by many researchers bagging may outperform the unbagged predictors, but this may not always be the case. Bagging is a device

to Friedman and Hall (2000) and Buja and Stuetzle (2002), bagging can increase the prediction accuracy

for non-linear predictors. Knight and Bassett (2002) show that under i.i.d. and some other assumptions,

bagged non-parametric quantile estimators and linear regression quantile estimators can outperform their

corresponding standard unbagged predictors. Therefore, we will discuss the effect of bagging on quantile pre-

dictions, by comparing the unbagged predictor QYt+1(α|Xt,Dt) and the bagging predictor QBYt+1(α|Xt,D∗t )(defined below).

The procedure of bagging for quantile predictors can be conducted in the following steps:

1. Construct the jth bootstrap sample D∗(j)t , j = 1, . . . , J, the bootstrap samples, according to the

empirical distribution of Dt.

2. Estimate β∗(j)t (α|D∗(j)t ) = argminβ(α)

Pts=t−R+1 ρα(Y

∗(j)s − X∗(j)0s−1 β(α)), t = R, . . . , T.

3. Compute the bootstrap quantile predictor from the jth bootstrapped sample, that is,

Q∗(j)Yt+1

(α|Xt,D∗(j)t ) ≡ X0tβ∗(j)t (α|D∗(j)t ). (27)

4. Finally, the bagging predictor QBYt+1(α|Xt,D∗t ) can be constructed by averaging over the J bootstrappredictors, i.e.,

QBYt+1(α|Xt,D∗t ) ≡ ED∗Q∗Yt+1(α|Xt,D∗t ),

where ED∗Q∗Yt+1(α|Xt,D∗t ) =PJ

j=1 wj,tQ∗(j)Yt+1

(α|Xt,D∗(j)t ) withPJj=1 wj = 1.

The same problem as for the bagging binary prediction is how to find a proper weight function. Again,

we use the equal weight (wj = J−1, j = 1, . . . , J) and the BMA-weight. We compute the BMA-weighted

bootstrap average as follows:

ED∗Q∗Yt+1(α|Xt,D∗t ) =

ZQ∗Yt+1(α|Xt,D∗t )dP (D∗t |Dt)

=JXj=1

Phβ∗(j)t (α|D∗(j)t ) | Dt

i ZQ∗Yt+1(α|Xt,D∗t )dP

hD∗t | β

∗(j)t (α|D∗(j)t ),Dt

i

≡JXj=1

wj,tQ∗(j)Yt+1

(α|Xt,D∗(j)t ),

where the last equality follows fromRQ∗Yt+1(α|Xt,D∗t )dP

hD∗t | β

∗(j)t (α|D∗(j)t ),Dt

i= Q

∗(j)Yt+1

(α|Xt,D∗(j)t ) and

by setting wj,t ≡ Phβ∗(j)t (α|D∗(j)t ) | Dt

i. The BMA-weights wj,t has the formula (25), that is computed

using

PhDt | β∗(j)t (α|D∗(j)t )

i∝ exp

Ã−

kXi=1

ρα(u∗(j)t−i,1)

!, (28)

16

Page 18: Bagging Binary Predictors for Time Series · As noticed by many researchers bagging may outperform the unbagged predictors, but this may not always be the case. Bagging is a device

with using the k-most recent in-sample observations, where u∗(j)t,1 = Yt+1 − Q∗(j)Yt+1

(α|Xt,D∗(j)t ). We again

select k = 1, 5, and R in the simulations and empirical experiments in Sections 6-7.

Why does the bagging work for quantile prediction? There have been several papers that are useful to

answer this question. It is generally explained in two ways. First, bagging can driven an estimator towards

its linear approximation, which usually has a lower variance. Second, the bagging process can work by

smoothing the sample objective function, so the bagging predictor will converge to the global minimizer of

sample objective function.

Bagging for quantile falls into the category of bagging regression, which includes totally different mecha-

nism for the classification problem as we discussed in the previous section. Friedman and Hall (2000), Buja

and Stuetzle (2002), and Knight and Bassett (2002) all use a certain kind of Taylor-expansion to rewrite

interested estimators, and show that bagging predictors are first order equivalent to the standard unbagged

predictors, but will lower the non-linearities of the predictor. However, all of these works make the i.i.d. as-

sumption of the data. Friedman and Hall (2000) use a Taylor-expansion to decompose statistical estimators

into linear and higher order parts, or decompose the objective function that they optimize into quadratic

and higher order terms. To apply a Taylor-expansion, the estimator must be a smooth function of sample

moments, or the cost function is differentiable. Therefore, we can not analyze bagging quantile predictors.

Buja and Stuetzle (2002) extend the application of bagging regression to more general circumstance by ex-

pressing predictors as statistical functionals (especially those can be written as U -statistics). They show

how the quantile estimators can be written as U -statistics. Since a Taylor expansion is no longer applicable

now, they use the von Mises expansion technique (based on a Taylor expansion) to prove that the leading

effects of bagging on variance, squared bias, and MSE are of order R−2, where R is the estimation sample

size. So, bagging may works for non-smooth target functions, such as median predictors and quantile pre-

dictors. Knight and Bassett (2002) illustrate that if the quantile estimator is asymptotically normal, we can

decompose the quantile estimator into linear and non-linear part using the Bahadur-Kiefer representation.

They show that bagging quantile estimators are first-order equivalent to the standard unbagged quantile

predictor, however,bagging can reduce the non-linearity of the sample quantile.

The second explanation why bagging works for quantile predictions has to do with the unstableness of the

quantile prediction. One source of unstableness is the non-differentiable features in the objective function

(7). The sample objective function used in regression as an estimator of the objective function will be even

worse behaved because of the limitation of sample size. There may be several equivalent minimizers for

sample objective function, and the numerical searching method may stop at any of these minimizers, or even

17

Page 19: Bagging Binary Predictors for Time Series · As noticed by many researchers bagging may outperform the unbagged predictors, but this may not always be the case. Bagging is a device

a local minimizer depending on the beginning point of the search. So the quantile estimator may have a

very high volatility.

For the above reasons, we conjecture that bagging will also improve quantile prediction with time series

data. We show the performance of bagging for quantile predictions in the next two sections. These results

show that bagging would be very useful in improving the VaR forecasts in financial risk management.

5 Monte Carlo

For the regression forecast (that takes numerical values), Breiman (1996a, Section 4.1) shows that if we can

draw training samples from the true data generating process, the bagging predictor (based on averaging) will

always give a lower MSE than the unbagged predictor. He also shows that, in practice, as we bootstrap a

sample to generate replicated training samples, bagging can give improvement only if the model is unstable.

Although we have a different cost function than MSE, that is the check function in (7) and we allow the

time dependence in the data, bagging works in the similar way as in Breiman (1996) according to our Monte

Carlo and empirical experiments.

For the classification forecast (that takes finite number of discrete values), Breiman (1996a, Section 4.2)

shows that bagging may not always give a better prediction, and sometimes, bagging may even give a worse

prediction. Our numerical results in both this section and next section give a similar conclusion.

In this section, we use a set of Monte Carlo simulations to gain further insights of conditions under which

bagging works. From the asymmetric cost function given in (9), we can obtain the out-of-sample average

costs for unbagged and bagging binary predictors:

S1 ≡ P−1TXt=R

c(et,1) = P−1

TXt=R

ρα(et,1),

S2 ≡ P−1TXt=R

c(eBt,1) = P−1

TXt=R

ρα(eBt,1),

where et,1 ≡ Gt+1 − Gt,1 and eBt,1 ≡ Gt+1 − GBt,1. We consider the asymmetric cost parameter α =

0.1, 0.3, 0.5, 0.7, and 0.9. It will be said that bagging “works” if S1 > S2. To rule out the chance of pure luck

by a certain criterion, we compute the following four summary performance statistics from 100 Monte Carlo

replications (r = 1, . . . , 100): T1 ≡ 1100

P100r=1 S

ra, T2 ≡

³1100

P100r=1(S

ra − T1)

´1/2, T3 ≡ 1

100

P100r=1 1(S

r1 > S

r2),

and T4 ≡ 1100

P100r=1 1(S

r1 = S

r2), where a = 1 for the non-bagged predictor and a = 2 for the bagged predic-

tor. T1 measures the Monte Carlo mean of the out-of-sample cost, T2 measures the Monte Carlo standard

deviation of the out-of-sample cost, T3 measures the Monte Carlo frequency that the bagging works, and

(T3 + T4) measures the Monte Carlo frequency that the bagging is no worse than the unbagged predictors.

18

Page 20: Bagging Binary Predictors for Time Series · As noticed by many researchers bagging may outperform the unbagged predictors, but this may not always be the case. Bagging is a device

For quantile predictors, we use the check function for Y given in (7):

S1 ≡ P−1TXt=R

ρα(ut,1), and S2 ≡ P−1TXt=R

ρα(uBt,1),

where ut,1 ≡ yt+1 − QYt+1(α|Xt) and uBt,1 ≡ yt+1 − QBYt+1(α|Xt). The definition of four statistics T1, T2, T3,

and T4 for the quantile prediction are the same as for the binary prediction.

We generate the data from

Yt = ρYt−1 + εt,

εt = zt£(1− θ) + θε2t−1

¤1/2zt ∼ i.i.d. MWi

where the i.i.d. innovation zt is generated from the first eight mixture normal distributions of Marron and

Wand (1992, p. 717), each of which will be denoted as MWi (i = 1, . . . , 8). In Tables 1-2, we consider

the data generating processes for ARCH-MW1 with θ = 0.5, 0.9 (and ρ = 0), while in Tables 3-10, we

consider the data generating processes for AR-MWi (i = 1, . . . , 8) with ρ = 0.6 (and θ = 0). Therefore, our

data generating processes fall into two categories: the (mean-unpredictable) martingale-difference ARCH(1)

processes without AR structure and the mean-predictable AR(1) processes without ARCH structure.

To see some properties of these densities, we generated 10,000 pseudo-random numbers for which some

summary statistics are reported below. All the symmetricMWi innovations (i = 1, 4, 5, 6, 7) are standardized

so that they all have zero mean and unit variance. The asymmetric MWi innovations (i = 2, 3, 8) have non-

zero mean but still unit variance. The skewness of the innovation ranges from −0.7 (MW2) to 1.5 (MW3),

and the kurtosis ranges from 1.4 (MW7) to 25 (MW5).

Density Mean Median Max Min Std.dev. Skewness KurtosisMW1 Gaussian -0.013 -0.011 4.335 -4.252 0.998 -0.009 3.078MW2 Skewed unimodal 0.927 1.038 3.797 -4.084 1.007 -0.723 3.990MW3 Strongly skewed -1.925 -2.315 3.113 -3.307 1.027 1.471 4.813MW4 Kurtoic unimodal 0.019 0.006 4.212 -4.619 0.998 0.038 4.514MW5 Outlier 0.002 -0.006 9.471 -9.521 0.981 0.361 24.777MW6 Bimodal -0.002 0.004 3.125 -2.737 1.010 0.004 2.054MW7 Separated bimodal 0.017 0.274 2.167 -2.015 1.002 -0.346 1.378MW8 Skewed bimodal 0.326 0.370 3.328 -3.436 1.002 -0.330 2.405

For each series, 100 extra series is generated and then discarded to alleviate the effect of the starting values

in random number generation. We consider one fixed out-of-sample size P = 100 and a range of estimation

sample sizes R = 20, 50, 100, 200. Our bagging predictors are generated by voting or by averaging over

J = 50 bootstrap predictors, while the results with J = 20 basically tell the same story.

19

Page 21: Bagging Binary Predictors for Time Series · As noticed by many researchers bagging may outperform the unbagged predictors, but this may not always be the case. Bagging is a device

Our binary prediction are based on quantile predictors as suggested in Proposition 3: Gt,1(Xt,Dt) ≡1(QYt+1(α|Xt,Dt) > 0), and we are using a simple univariate linear quantile regression model QYt+1(α|Xt) =

β0(α)+β1(α)Yt+β2(α)Y2t , which is estimated using the interior-point algorithm used by Portnoy and Koenker

(1997).

To generate bootstrap samples, we use the block bootstrap for both Monte Carlo experiments in this

section and the empirical application in the next section. We choose the block size that minimizes the

in-sample average cost recursively and therefore we use a different block size at a each forecast period and

for each different cost function with different α’s.

All the Monte Carlo results are reported in Tables 1-10. According to all the tables, for both binary

predictions and quantile predictions, bagging works well when the sample size is small. The improvement

of bagging predictors over unbagged ones becomes less significant when the sample size increases. This is

true in terms of all three criteria, T1, T2, and T3, that is, bagging gives the largest reduction in the mean

cost, the largest reduction in the variance of the cost, and the highest frequency of out-performance, when

R = 20. Let us use Tables 1-2 as examples for now. The mean cost reduction (in terms of T1) for both binary

prediction and quantile prediction can be as much as about 20% for all α’s; the variance of cost (in terms of

T2) can be reduced by as much as about 20% for binary prediction and about 50% for quantile prediction;

and T3 is as high as 95% for the binary prediction and 100% for the quantile prediction. This result shows

that bagging can significantly mitigate the problem of the sample shortage.

This function of bagging can also be observed by the performance of bagging on tails. The scarce of

observations on tails usually will lead to the poor predictions, however, the degree of improvement of bagging

predictors are very significant for α = 0.1 and 0.9 according to our Monte Carlo results. We illustrate this

fact using Table 3 to 10 in case of R = 20. For binary prediction, the average cost reduction (in terms of

T1) for α = 0.1 and 0.9 is about 15%, however, the average cost reduction for α = 0.5 is only about 5%.

The average cost variance (in terms of T2) reduction for α = 0.1 and 0.9 is about 20%, however, the average

cost variance reduction for α = 0.5 is not significant. The frequency that bagging works (in terms of T3) for

α = 0.1 and 0.9 is about 80%, however, the frequency that bagging works for α = 0.5 is only about 60%. For

quantile prediction, the average cost reduction (in terms of T1) for α = 0.1 and 0.9 is about 20%, however,

the average cost reduction for α = 0.5 is only about 5%. The average cost variance reduction (in terms of

T2) for α = 0.1 and 0.9 is about 40%, however, the average cost variance reduction for α = 0.5 is about 20%.

The frequency that bagging works (in terms of T3) for α = 0.1 and 0.9 and α = 0.5 are similarly around

90%.

20

Page 22: Bagging Binary Predictors for Time Series · As noticed by many researchers bagging may outperform the unbagged predictors, but this may not always be the case. Bagging is a device

The advantage of bagging for quantile predictions (in terms of T1, T2, and T3) decreases gradually as the

sample size increases, but still exist for R = 200 though the advantage is not very significant. However, for

the binary prediction, this advantage disappears much faster, and will totally disappear when the sample

size exceeds 100 for most data generating processes (DGP). With a large sample size, the bagging predictor

will converge to the unbagged predictors as predicted from Proposition 7.

Another interesting phenomenon we observe is that bagging works better (in terms of T1, T2, and T3) for

both binary prediction and quantile when there is ARCH structure in the DGP. For the AR processes, the

mean is predictable, so the binary predictors perform pretty well even without bagging. Therefore, there is

not much room left for bagging to improve on. See Proposition 4. However, for the ARCH processes without

the AR term, the conditional mean is unpredictable. Although there is some binary predictability through

the time-varying conditional higher moments, the predictability is harder to explore than in the AR models.

As we can see from the tables, the mean cost (T1) of the unbagged binary predictor in Tables 3-10 is much

lower than the mean cost of the unbagged binary predictors in Tables 1-2. At the same time, the non-linear

structure in the ARCH models lead to the difficulty in the parameter estimation via numerical optimization,

which also leaves more room for bagging to work. Therefore, we can see that bagging improves more (in all

of T1, T2, and T3) in Tables 1-2 than in Tables 3-10 for both binary predictions and quantile predictions.

One more observation from the Monte Carlo simulation is that bagging works asymmetrically (in terms

of T1, T2, and T3) for asymmetric data. For MW distribution with asymmetric distributions like in Table

4 (MW2), 5 (MW3), and 10 (MW8), bagging works better if QYt+1(α|Xt) lies on the flatter tail for bothbinary and quantile prediction. For example, MW2 and MW8 have flatter left tail, therefore bagging works

better for a smaller α in Tables 4 and 10; while MW3 has flatter right tail, therefore bagging works better

for a larger α in Table 5.

In the previous section, we introduced several weights to form bagging predictors: equal weight and the

BMA-weights with lags k = 1, 5, and R, defined in (26) and (28). With less lags, we intend to focus on the

most recent performance of each bootstrap predictor. According to our simulation results, the BMA-weight

with k = 1 performs the best for our DGP’s, although all weights often work quite similarly.

6 Empirical Application

Our empirical application of the binary prediction is to time the market for the S&P500 and NASDAQ

indexes. Let Yt represents the log difference of a stock index at time t, and suppose we are interested in

predicting whether the stock index will rise or not in the next month.

21

Page 23: Bagging Binary Predictors for Time Series · As noticed by many researchers bagging may outperform the unbagged predictors, but this may not always be the case. Bagging is a device

The S&P 500 series, retrieved from finance.yahoo.com, is monthly data from October 1982 to February

2004 (T + 1 = 257). The NASDAQ series is also retrieved from finance.yahoo.com, monthly from October

1984 to February 2004 (T + 1 = 233). We split the series into two parts: one for in-sample estimation with

the size R = 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140 and 150, and another for out-of-sample

forecast validation with sample size P = 100 (fixed for all R’s). We choose the most recent P = 100 months

in the sample as a out-of-sample validation sample. We use a rolling-sample scheme, that is, the first forecast

is based on observations T −P −R+1 through T −P , the second density forecast is based on observationsT − P −R+ 2 through T − P + 1, and so on. The two series are summarized as follows:

In-sample period Out-of-sample period T + 1 PS&P 500 October 1982 ∼ October 1995 November 1995 ∼ February 2004 257 100Nasdaq October 1984 ∼ October 1995 November 1995 ∼ February 2004 233 100

Mean Median Max Min Std.dev. Skewness KurtosisS&P 500 0.008 0.013 0.124 -0.245 0.045 -1.066 6.996Nasdaq 0.009 0.019 0.199 -0.318 0.072 -1.017 5.921

We consider the asymmetric cost parameter α = 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, and 0.9 to represent

different possible preferences. With the cost function given in (9), if we predict the market to go down while

the market goes up, the cost is α; and if we predict the market to go up while the market goes down, the

cost is 1 − α. Therefore, if a person only buys and holds the stock, she tends to have a α smaller than 0.5

because missing a earning opportunity will not be as bad as losing money. However, if a person wants to

sell short, she tends to have a α larger than 0.5 because of the leverage effect. Since most investors belong

to the first category, the more predictability may be exploited for small α’s.

Figures 1-2 present the graphs of the out-of-sample average costs (S1 or S2) in vertical axis against the

training sample size R in the horizontal axis, for the nine values of α. There are lines for each α — the dark

solid line is for the cost S1 of the unbagged predictor, and the other four lines are for the cost S2 of the

bagging predictor with different weights (equal weight or the BMA weights).

Figure 1(a) and Figure 2(a) (for S&P500 and NASDAQ respectively) tell a similar story. We expect

that the prediction with smaller α’s will be harder than with larger α’s, which is confirmed by our empirical

results. For the binary prediction bagging works better with smaller α and with the smaller R. The bagging

predictors with different weighting schemes seem to work similarly.

For all α’s and for both binary and quantile predictions, the cost of the bagging predictors converge to

the optimal cost much faster than that of the unbagged predictors. We can see that the bagging predictors

converge to the stable level of the cost for R as small as 20 in most cases for both binary predictions and

quantile predictions. However, a larger R is needed for the unbagged predictors converge to that level of the

cost.

22

Page 24: Bagging Binary Predictors for Time Series · As noticed by many researchers bagging may outperform the unbagged predictors, but this may not always be the case. Bagging is a device

When the sample size is small, bagging can lower the cost to larger extent, and bagging predictors almost

never get worse than the unbagged predictors. When R = 20, bagging can lower the cost as great as 50% for

both binary predictions and quantile predictions! When R grows larger, unbagged predictors and bagging

predictors will converge to the same stable cost level.

7 Conclusions

We have examined how bagging works for binary prediction with an asymmetric cost function for time series.

We compute the binary predictor from the quantile predictor. Bagging binary predictors are constructed via

majority voting on the binary predictors trained on the bootstrapped training samples. We have shown the

conditions under which bagging works for the bagging binary predictors. Based on the asymmetric quantile

check functions, by treating it as a quasi likelihood, we have derived the various BMA-weights to form the

weighted bagging both for binary and quantile predictors. The simulation results and the empirical results

using two U.S. stock index monthly returns, not only confirm but also clearly demonstrate the conditions

that bagging may work. The main finding of the paper is that bagging basically works when the size of the

training sample is small and the predictor is unstable. Bagging does not work when the training sample size

is large, and so bagging seems to mainly smooth out the parameter estimation uncertainty due to a small

sample to improve the forecasting performance.

The advantage of bagging discussed and examined in this paper gives the bagging a very wide potential

application in areas where small sample is common. Bagging will be particularly relevant and useful in

practice when structural breaks are frequent so that simply using as many observations as possible is not

a wise choice for out-of-sample prediction, as emphasized in Pesaran and Timmermann (2001, 2002b) and

Paye and Timmermann (2003).

There is still much work to be done in this research. We have considered only a linear quantile model

in this paper. The bagging may be even more useful for the nonlinear quantile models as considered by

Komunjer (2003) and White (1992). Also, in this paper, we have considered only one-step ahead forecasts.

Bagging multi-step ahead forecasts for binary variables and quantiles is an interesting topic in our next

research agenda. More empirical work should be conducted to evaluate the gain from bagging in forecasting

many macroeconomic and financial time series in various frequencies.

23

Page 25: Bagging Binary Predictors for Time Series · As noticed by many researchers bagging may outperform the unbagged predictors, but this may not always be the case. Bagging is a device

References

Aiolfi, M. and A. Favero (2002), “Model Uncertainty, Thick Modelling, and the Predictability of Stock

Returns”, University of Bocconi.

Aiolfi, M. and A. Timmermann (2004), “Persistence in Forecasting Performance”, UCSD.

Avramov, K. (2002), “Stock Return Predictability and Model Uncertainty”, Journal of Financial Eco-

nomics, 64, 423-458.

Bates, J.M. and C.W.J. Granger (1969), “The Combination of Forecasts”, Operations Research Quarterly,

20, 451-468.

Bauer, E. and R. Kohavi (1999), “An Empirical Comparison of Voting Classification Algorithms: Bagging,

Boosting, and Variants”, Machine Learning, 36, 105-139.

Breiman, L. (1996a), “Bagging Predictors”, Machine Learning, 24, 123-140.

Breiman, L. (1996b), “Heuristics of Instability and Stabilization in Model Selection”, Annals of Statistics,

24(6), 2350—2383.

Buhlmann, P. and B. Yu (2002), “Analyzing Bagging”, Annals of Statistics, 30(4), 927-961.

Buja, Andreas and W. Stuetzle (2002), “Observations on Bagging”, University of Pennsylvania and Uni-

versity of Washington, Seattle.

Christofferson, P.F. and F.X. Diebold (2003), “Financial Asset Returns, Market Timing, and Volatility

Dynamics”, McGill University and University of Pennsylvania.

Chernozhukov, V. and Len Umantsev (2001), “Conditional Value-at-Risk: Aspects of Modeling and Esti-

mation”, Empirical Economics, 26, 271-292.

Ding, Z., C.W.J. Granger and R. Engle (1993), “A Long Memory Property of Stock Market Returns and

a New Model”, Journal of Empirical Finance, 1, 83-106.

Dzeroski, S. and B. Zenko (2004), “Is Combining Classifiers with Stacking Better than Selecting the Best

One?”, Machine Learning, 54, 255-273.

Elliott, G., I. Komunjer, and A. Timmermann (2003a), “Biases in Macroeconomic Forecasts: Irrationality

or Asymmetric Loss?”, UCSD and Caltech.

Elliott, G., I. Komunjer, and A. Timmermann (2003b), “Estimating Loss Function Parameters”, UCSD

and Caltech.

Evgeniou, T., M. Pontil, and A. Elisseeff (2004), “Leave One Out Error, Stability, and Generalization of

Voting Combinations of Classifiers”, forthcoming, Machine Learning.

Fitzenberger, B. (1997), “The Moving Blocks Bootstrap and Robust Inference for Linear Least Squares and

Quantile Regressions”, Journal of Econometrics, 82, 235-287.

Friedman, J.H. and P. Hall (2000), “On Bagging and Nonlinear Estimation”, Stanford University and

Australian National University.

24

Page 26: Bagging Binary Predictors for Time Series · As noticed by many researchers bagging may outperform the unbagged predictors, but this may not always be the case. Bagging is a device

Garratt, A., K. Lee, H.M. Pesaran, and Y. Shin (2003), “Forecast Uncertainties in Macroeconometric

Modelling: An Application to the UK Economy”, Journal of American Statistical Association, 98,

829-838.

Grandvalet, Y. (2004), “Bagging Equalizes Influence”, Machine Learning, in press.

Granger, C.W.J. (1999a), Empirical Modeling in Economics: Specification and Evaluation, Cambridge

University Press: London.

Granger, C.W.J. (1999b), “Outline of Forecast Theory Using Generalized Cost Functions”, Spanish Eco-

nomic Review 1, 161-173.

Granger, C.W.J. (2002), “Some Comments on Risk”, Journal of Applied Econometrics, 17, 447-456.

Granger, C.W.J., M. Deutsch, and T. Terasvirta (1994), “The Combination of Forecasts Using Changing

Weights”, International Journal of Forecasting, 10, 47-57.

Granger, C.W.J., Z. Ding and S. Spear (2000), “Stylized Facts on the Temporal and Distributional Prop-

erties of Absolute Returns: An Update”, Working paper, University of California, San Diego.

Granger, C.W.J. and Y. Jeon (2004), “Thick Modeling”, Economic Modelling, 21, 323-343.

Granger, C.W.J. and M.H. Pesaran (2000), “Economic and Statistical Seasures of Forecast Accuracy”,

Journal of Forecasting, 19, 537-560.

Hahn, J. (1997), “Bayesian Bootstrap of the Quantile Regression Estimator: A Large Sample Study”,

Internationl Economic Review, 38, 795-808.

Hardle, W., J. Horowitz and J.P. Kreiss (2003), “Bootstrap Methods for Time Series”, International Sta-

tistical Review, 71(2), 435-459.

Harter, H.L. (1977), “Nonuniqueness of Least Absolute Values Regression”, Communications in Statistics

- Theory and Methods, A6, 829-838.

Hong, Y. and J. Chung (2003), “Are the Directions of Stock Price Changes Predictable? Statistical Theory

and Evidence”, Cornell University.

Hong, Y. and T.-H. Lee (2003), “Inference on Predictability of Foreign Exchange Rates via Generalized

Spectrum and Nonlinear Time Series Models”, Review of Economics and Statistics, 85(4), November,

in press.

Horowitz, J.L.(1992), “A Smoothed Maximum Score Estimator for the Binary Response Model”, Econo-

metrica, 60(3), 505-531.

Horowitz, J.L.(1998), “Bootstrap Methods for Median Regression Models”, Econometrica, 66(6), 1327-1351.

Kim, J. and D. Pollard (1990), “Cube Root Asymptotics”, Annals of Statistics, 18, 191-219.

Kim, T.H. and H. White (2001), “Estimation, Inference, and Specification Testing for Possibly Misspecified

Quantile Regressions”, Advances in Econometrics, forthcoming.

Knight, K. and G.W. Bassett Jr. (2002), “Second Order Improvements of Sample Quantiles Using Sub-

samples”, University of Toronto and University of Illinois, Chicago.

25

Page 27: Bagging Binary Predictors for Time Series · As noticed by many researchers bagging may outperform the unbagged predictors, but this may not always be the case. Bagging is a device

Knox, T., J.H. Stock, and M.W. Watson (2000), “Empirical Bayes Forecasts of One Time Series Using

Many Predictors,” Working Paper, Department of Economics, Harvard University.

Koenker, R and G. Basset (1978), “Asymptotic Theory of Least Absolute Error Regression”, Journal of

the American Statistical Association, 73, 618-622.

Komunjer, I. (2003), “Quasi-Maximum Likelihood Estimation for Conditional Quantiles”, Caltech.

Kordas, G. (2002), “Smoothed Binary Regression Quantiles”, University of Pennsylvania.

Kuncheva, L.I. and C.J. Whitaker (2003), “Measures of Diversity in Classifier Ensembles and Their Rela-

tionship with the Ensemble Accuracy”, Machine Learning, 51, 181-207.

Luce, R.D. (1980), “Some Possible Measures of Risk”, Theory and Decision, 12, 217-228.

Luce, R.D. (1981), Corrections to “Some Possible Measures of Risk”, Theory and Decision, 13, 381.

Manski, C.F. (1975), “Maximum Score Estimation of the Stochastic Utility Model of Choice”, Journal of

Econometrics, 3(3), 205-228.

Manski, C.F. (1985), “Semiparametric Analysis of Discrete Response: Asymptotic Properties of the Maxi-

mum Score Estimator”, Journal of Econometrics, 27(3), 313-333.

Manski, C.F., and T. S. Thompson (1989), “Estimation of Best Predictors of Binary Response”, Journal

of Econometrics, 40, 97-123.

Marron, J.S. and M.P. Wand (1992), “Exact Mean Integrated Squared Error”, Annals of Statistics, 20,

712—736.

Money, A.H., J.F. Affleck-Graves, M.L. Hart, and G.D.I. Barr (1982), “The Linear Regression Model and

the Choice of p”, Communications in Statistics - Simulations and Computations, 11(1), 89-109.

Nyguist, H. (1983), “The Optimal Lp-norm Estimation in Linear Regression Models”, Communications in

Statistics - Theory and Methods, 12, 2511-2524.

Paye, B.S. and A. Timmermann (2003), “Instability of Return Predictability Models”, UCSD.

Pesaran, M.H. and A. Timmermann (2001), “How Costly Is It To Ignore Breaks When Forecasting the

Direction of a Time Series?”, forthcoming, International Journal of Forecasting.

Pesaran, M.H. and A. Timmermann (2002a), “Market Timing and Return Prediction Under Model Insta-

bility”, Journal of Empirical Finance, 9, 495-510.

Pesaran, M.H. and A. Timmermann (2002b), “Model Instability and Choice of Observation Window”,

UCSD and Cambridge.

Politis, D.N. (2003), “The Impact of Bootstrap Methods on Time Series Analysis”, forthcoming in Statistical

Science.

Powell, J.L. (1986), “Censored Regression Quantiles”, Journal of Econometrics, 32, 143-155.

Portnoy, S. and R. Koenker (1997), “The Gaussian Hare and the Laplacean Tortoise: Computability of l1vs l2 Regression Estimators”, Statistical Science, 12, 279-300.

26

Page 28: Bagging Binary Predictors for Time Series · As noticed by many researchers bagging may outperform the unbagged predictors, but this may not always be the case. Bagging is a device

Rao J.S. and Tibshirani R. (1997), “The Out-of-Bootstrap Method for Model Averaging and Selection”,

University of Toronto.

Skouras, S. (2002), “The Sign of a Mean Regression: Characterization, Estimation and Application”, Santa

Fe Institute.

Stock, J.H. and M.W. Watson (1999), “A Comparison of Linear and Nonlinear Univariate Models for

Forecasting Macroeconomic Time Series”, in Cointegration, Causality, and Forecasting, A Festschrift

in Honor of C.W.J. Granger, (eds.) R.F. Engle and H. White, Oxford University Press: London, pp.

1-44.

West, K.D. and M.W. McCracken (1998), “Regression-Based Tests of Predictive Ability”, International

Economic Review, 39(4), 817-840.

White, H. (1992), “Nonparametric Estimation of Conditional Quantiles Using Neural Networks”, Proceed-

ings of the Symposium on the Interface, New York: Springer-Verlag, 190-199.

Yang, Yuhong (2004), “Combining Forecasting Procedures: Some Theoretical Results”, Econometric The-

ory, 20, 176-222.

Zou, Hui and Yuhong Yang (2004), “Combining Time Series Models for Forecasting”, International Journal

of Forecasting, 20, 69-84.

27

Page 29: Bagging Binary Predictors for Time Series · As noticed by many researchers bagging may outperform the unbagged predictors, but this may not always be the case. Bagging is a device

Table 1. AR(0)-ARCH(1)-MW1(Gaussian): 111 +++ = ttt huy , 21 5.05.0 tt yh +=+

R=20 R =50 R =100 R =200 J=1 J =50 J =1 J =50 J =1 J =50 J =1 J =50

Binary EW W1 W5 WR EW W1 W5 WR EW W1 W5 WR EW W1 W5 WR T1 6.89 5.71 5.56 5.60 5.63 5.41 5.11 5.08 5.09 5.09 5.10 5.05 5.05 5.05 5.05 5.04 5.04 5.04 5.04 5.04 T2 1.28 1.09 0.98 1.04 1.05 0.80 0.54 0.52 0.52 0.53 0.55 0.51 0.51 0.51 0.51 0.54 0.54 0.54 0.54 0.54 T3 0.80 0.80 0.81 0.81 0.35 0.36 0.36 0.34 0.08 0.08 0.08 0.08 0.00 0.00 0.00 0.00

α=.1

T4 0.08 0.09 0.07 0.07 0.55 0.56 0.56 0.56 0.90 0.90 0.90 0.90 1.00 1.00 1.00 1.00 T1 18.42 15.50 15.20 15.37 15.48 16.36 15.28 15.27 15.27 15.31 15.56 15.17 15.18 15.17 15.16 15.25 15.14 15.13 15.13 15.13 T2 2.44 2.28 2.13 2.23 2.19 1.94 1.69 1.62 1.65 1.71 1.74 1.50 1.50 1.51 1.51 1.66 1.65 1.64 1.64 1.65 T3 0.89 0.95 0.93 0.91 0.80 0.83 0.80 0.79 0.39 0.37 0.40 0.40 0.16 0.16 0.16 0.16

α=.3

T4 0.01 0.00 0.00 0.02 0.07 0.08 0.08 0.08 0.51 0.53 0.52 0.51 0.82 0.81 0.82 0.82 T1 25.56 21.26 21.24 21.38 21.48 25.17 22.63 22.67 22.65 22.69 25.04 23.22 23.22 23.20 23.27 24.93 23.23 23.33 23.17 23.24 T2 2.29 3.38 3.11 3.26 3.32 2.20 2.49 2.36 2.25 2.38 2.60 2.35 2.48 2.61 2.31 2.56 2.78 2.54 2.81 2.77 T3 0.93 0.97 0.94 0.93 0.81 0.82 0.82 0.82 0.80 0.80 0.77 0.80 0.70 0.72 0.72 0.65

α=.5

T4 0.04 0.00 0.02 0.03 0.09 0.06 0.08 0.09 0.07 0.03 0.09 0.06 0.07 0.03 0.06 0.14 T1 17.80 15.57 15.41 15.57 15.60 15.82 14.93 14.88 14.94 14.95 15.15 14.88 14.87 14.86 14.87 14.98 14.87 14.86 14.87 14.88 T2 2.19 2.08 2.09 2.07 2.11 1.96 1.63 1.61 1.68 1.63 1.68 1.60 1.58 1.58 1.60 1.70 1.63 1.63 1.63 1.62 T3 0.88 0.91 0.87 0.86 0.73 0.75 0.73 0.73 0.37 0.37 0.37 0.37 0.16 0.17 0.16 0.14

α=.7

T4 0.00 0.01 0.02 0.00 0.11 0.11 0.13 0.11 0.54 0.57 0.56 0.54 0.80 0.79 0.79 0.81 T1 6.94 5.89 5.83 5.86 5.92 5.22 4.96 4.97 4.97 4.96 5.03 4.95 4.95 4.95 4.96 4.96 4.96 4.96 4.96 4.96 T2 1.49 1.45 1.39 1.38 1.47 0.69 0.50 0.52 0.52 0.50 0.62 0.51 0.51 0.51 0.52 0.54 0.54 0.54 0.54 0.54 T3 0.77 0.78 0.79 0.77 0.29 0.28 0.30 0.29 0.09 0.09 0.09 0.09 0.00 0.00 0.00 0.00

α=.9

T4 0.07 0.09 0.05 0.07 0.62 0.64 0.62 0.62 0.89 0.89 0.89 0.89 0.99 0.99 0.99 0.99 Quantile

T1 25.46 19.20 18.30 18.60 18.93 19.27 17.13 17.07 17.10 17.13 17.83 16.77 16.75 16.77 16.78 16.73 16.28 16.29 16.28 16.28 T2 6.75 3.91 3.47 3.40 3.64 3.02 2.47 2.43 2.44 2.47 3.27 2.57 2.57 2.55 2.57 2.73 2.47 2.49 2.47 2.47 T3 0.96 0.98 0.98 0.97 0.97 0.97 0.97 0.97 0.93 0.92 0.93 0.93 0.88 0.88 0.88 0.88

α=.1

T4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 T1 40.75 33.69 33.57 33.55 33.67 35.74 32.83 32.91 32.86 32.86 34.28 32.86 32.84 32.86 32.87 32.86 32.21 32.21 32.21 32.21 T2 7.03 5.22 5.19 5.02 5.10 4.83 4.02 4.09 4.03 4.03 5.12 4.70 4.64 4.71 4.70 4.51 4.37 4.38 4.37 4.37 T3 0.97 0.99 1.00 0.99 0.99 0.99 1.00 0.99 0.96 0.97 0.97 0.96 0.90 0.89 0.90 0.90

α=.3

T4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 T1 45.31 38.12 38.25 38.02 38.15 40.62 37.32 37.56 37.38 37.35 38.87 37.23 37.29 37.24 37.24 37.53 36.80 36.81 36.80 36.80 T2 7.12 6.14 6.50 5.98 6.08 5.43 4.50 4.73 4.53 4.51 6.13 5.04 5.18 5.07 5.04 4.90 4.85 4.84 4.85 4.85 T3 1.00 1.00 1.00 1.00 0.99 0.98 0.99 0.99 0.99 0.99 0.99 0.99 0.93 0.92 0.94 0.93

α=.5

T4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 T1 40.10 33.98 33.87 33.67 33.92 35.56 32.39 32.52 32.40 32.41 33.86 32.44 32.48 32.46 32.45 32.70 32.09 32.13 32.10 32.10 T2 6.32 5.93 6.10 5.44 5.75 5.12 3.76 3.91 3.77 3.77 5.53 4.74 4.82 4.78 4.75 4.42 4.26 4.31 4.27 4.26 T3 0.97 0.98 0.98 0.98 0.98 0.99 0.99 0.98 0.96 0.96 0.96 0.96 0.90 0.90 0.92 0.90

α=.7

T4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 T1 24.64 19.88 18.96 19.10 19.45 19.19 16.74 16.72 16.72 16.74 17.58 16.58 16.57 16.57 16.58 16.86 16.34 16.36 16.35 16.34 T2 5.18 5.06 4.15 4.07 4.38 3.96 2.44 2.50 2.43 2.45 3.13 2.72 2.72 2.72 2.72 2.58 2.40 2.43 2.41 2.40 T3 0.91 0.97 0.97 0.96 0.98 0.98 0.98 0.98 0.92 0.91 0.92 0.92 0.88 0.87 0.88 0.88

α=.9

T4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Note: EW and Wk denote the weighted-bagging with equal weights (EW) and with BMA-weights using k most recent in-sample observations (k = 1, 5, and R).

Page 30: Bagging Binary Predictors for Time Series · As noticed by many researchers bagging may outperform the unbagged predictors, but this may not always be the case. Bagging is a device

Table 2. AR(0)-ARCH(1)-MW1(Gaussian): 111 +++ = ttt huy , 21 9.01.0 tt yh +=+

R=20 R =50 R =100 R =200 J=1 J =50 J =1 J =50 J =1 J =50 J =1 J =50

Binary EW W1 W5 WR EW W1 W5 WR EW W1 W5 WR EW W1 W5 WR T1 7.14 5.61 5.45 5.55 5.59 5.41 5.06 5.05 5.05 5.06 5.13 5.05 5.05 5.05 5.05 5.04 5.04 5.04 5.04 5.04 T2 1.39 1.21 1.11 1.12 1.18 0.80 0.48 0.47 0.47 0.49 0.63 0.51 0.51 0.51 0.51 0.55 0.54 0.54 0.54 0.54 T3 0.87 0.91 0.88 0.87 0.39 0.40 0.39 0.39 0.10 0.10 0.10 0.10 0.01 0.01 0.01 0.01

α=.1

T4 0.04 0.05 0.05 0.05 0.52 0.51 0.53 0.51 0.88 0.88 0.88 0.88 0.97 0.97 0.97 0.97 T1 18.43 15.64 15.49 15.51 15.64 16.75 15.45 15.45 15.47 15.48 15.71 15.18 15.19 15.20 15.19 15.39 15.13 15.11 15.11 15.13 T2 2.10 2.23 2.09 2.06 2.15 2.04 1.82 1.73 1.78 1.83 1.77 1.61 1.57 1.60 1.60 1.72 1.62 1.61 1.61 1.62 T3 0.86 0.91 0.90 0.87 0.84 0.83 0.85 0.84 0.48 0.50 0.47 0.46 0.27 0.26 0.27 0.27

α=.3

T4 0.01 0.02 0.02 0.01 0.08 0.08 0.07 0.08 0.45 0.44 0.46 0.45 0.69 0.69 0.69 0.69 T1 25.45 20.92 20.92 21.06 21.07 24.99 22.61 22.89 22.63 22.73 24.95 23.19 23.24 23.22 23.29 24.83 23.74 23.85 23.70 23.70 T2 2.43 3.65 3.29 3.42 3.60 2.15 2.36 2.33 2.40 2.43 2.48 2.60 2.53 2.59 2.63 2.51 2.63 2.81 2.77 2.50 T3 0.95 0.97 0.96 0.95 0.85 0.80 0.83 0.85 0.76 0.73 0.74 0.76 0.69 0.67 0.65 0.70

α=.5

T4 0.01 0.02 0.01 0.02 0.05 0.04 0.06 0.07 0.06 0.10 0.05 0.06 0.11 0.09 0.11 0.08 T1 18.13 15.58 15.31 15.53 15.64 16.24 14.93 14.94 14.95 14.99 15.33 14.88 14.92 14.91 14.90 15.16 14.99 15.01 15.02 15.00 T2 2.35 2.14 1.95 2.06 2.13 2.04 1.74 1.68 1.72 1.81 1.86 1.69 1.72 1.68 1.68 1.77 1.73 1.73 1.76 1.72 T3 0.92 0.93 0.92 0.92 0.81 0.81 0.80 0.81 0.47 0.44 0.45 0.45 0.25 0.24 0.23 0.25

α=.7

T4 0.01 0.01 0.00 0.01 0.12 0.14 0.12 0.12 0.45 0.46 0.47 0.47 0.69 0.69 0.70 0.69 T1 7.25 5.69 5.64 5.69 5.74 5.32 4.99 4.98 4.98 5.00 5.03 4.95 4.94 4.95 4.95 5.02 4.98 4.97 4.98 4.98 T2 1.38 1.30 1.24 1.34 1.31 0.71 0.57 0.54 0.57 0.58 0.60 0.51 0.51 0.51 0.51 0.63 0.55 0.54 0.55 0.55 T3 0.89 0.89 0.88 0.88 0.39 0.40 0.39 0.39 0.08 0.09 0.08 0.08 0.03 0.03 0.03 0.03

α=.9

T4 0.06 0.07 0.06 0.06 0.50 0.52 0.53 0.51 0.91 0.90 0.91 0.91 0.97 0.97 0.97 0.97 Quantile

T1 19.36 13.51 12.56 12.72 13.12 13.87 11.72 11.77 11.73 11.75 12.17 11.03 11.00 11.05 11.03 11.03 10.55 10.58 10.57 10.56 T2 9.78 5.45 4.79 4.55 5.05 6.32 3.81 4.22 4.01 3.95 4.47 3.36 3.18 3.36 3.37 3.83 3.20 3.27 3.24 3.23 T3 0.94 0.98 0.96 0.95 0.97 0.97 0.98 0.97 0.91 0.91 0.91 0.91 0.80 0.81 0.79 0.80

α=.1

T4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 T1 28.81 22.17 22.38 21.88 22.07 25.16 21.68 22.26 21.67 21.67 22.51 20.88 20.96 20.92 20.89 21.28 20.23 20.36 20.26 20.23 T2 11.08 7.32 8.73 6.75 7.00 9.95 6.29 8.58 6.31 6.23 7.49 5.88 6.02 5.96 5.90 6.92 5.52 5.83 5.62 5.54 T3 0.98 1.00 1.00 0.99 0.99 0.98 1.00 0.99 0.96 0.96 0.97 0.96 0.90 0.87 0.90 0.90

α=.3

T4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 T1 31.95 24.13 24.92 24.03 24.02 27.75 24.01 24.68 24.08 23.99 25.77 23.37 23.75 23.43 23.38 24.14 23.02 23.13 23.06 23.03 T2 12.33 6.78 10.48 6.97 6.55 10.24 6.44 8.33 6.64 6.38 9.75 6.13 7.23 6.31 6.14 7.93 6.16 6.60 6.28 6.20 T3 0.99 0.99 1.00 1.00 0.98 0.99 0.99 0.99 0.96 0.96 0.96 0.96 0.95 0.94 0.95 0.95

α=.5

T4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 T1 28.65 21.85 22.56 21.67 21.72 24.23 21.01 21.46 21.13 21.02 22.41 20.60 20.83 20.70 20.61 21.13 20.32 20.48 20.34 20.34 T2 9.96 6.62 9.87 6.63 6.39 8.51 5.52 7.12 5.88 5.53 7.97 5.85 6.51 6.08 5.87 6.87 5.76 6.16 5.80 5.84 T3 0.98 0.98 0.99 0.98 0.99 0.99 0.99 0.99 0.97 0.97 0.97 0.97 0.89 0.86 0.90 0.89

α=.7

T4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 T1 18.22 14.12 12.85 12.98 13.33 13.66 11.28 11.26 11.22 11.27 11.97 10.84 10.86 10.87 10.85 11.29 10.78 10.73 10.78 10.80 T2 8.16 6.70 4.93 4.86 5.23 6.53 3.61 3.67 3.48 3.58 4.57 3.54 3.48 3.58 3.54 4.18 3.60 3.41 3.56 3.68 T3 0.93 0.97 0.98 0.96 0.92 0.93 0.92 0.92 0.84 0.85 0.85 0.84 0.90 0.91 0.90 0.90

α=.9

T4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

Page 31: Bagging Binary Predictors for Time Series · As noticed by many researchers bagging may outperform the unbagged predictors, but this may not always be the case. Bagging is a device

Table 3. AR(1)-ARCH(0)-MW1(Gaussian): 11 6.0 ++ += ttt uyy

R=20 R =50 R =100 R =200 J=1 J =50 J =1 J =50 J =1 J =50 J =1 J =50

Binary EW W1 W5 WR EW W1 W5 WR EW W1 W5 WR EW W1 W5 WR T1 6.52 5.66 5.36 5.46 5.55 4.97 4.78 4.78 4.78 4.78 5.09 4.95 4.93 4.95 4.94 4.91 4.94 4.94 4.94 4.94 T2 1.45 1.54 1.28 1.42 1.44 1.08 0.86 0.86 0.86 0.86 0.98 0.90 0.88 0.90 0.89 0.92 0.84 0.84 0.84 0.84 T3 0.72 0.78 0.77 0.76 0.43 0.43 0.43 0.43 0.31 0.34 0.31 0.32 0.18 0.19 0.18 0.18

α=.1

T4 0.01 0.03 0.02 0.02 0.15 0.15 0.15 0.15 0.28 0.24 0.28 0.28 0.21 0.21 0.21 0.21 T1 14.46 13.35 13.11 13.23 13.33 12.94 12.63 12.64 12.58 12.64 12.91 12.87 12.85 12.84 12.86 12.06 12.35 12.35 12.36 12.35 T2 2.41 2.24 2.14 2.13 2.22 2.57 2.24 2.28 2.21 2.25 1.90 2.00 1.97 1.99 1.99 2.28 2.21 2.22 2.20 2.20 T3 0.76 0.82 0.81 0.79 0.55 0.56 0.54 0.53 0.39 0.39 0.37 0.41 0.33 0.32 0.34 0.33

α=.3

T4 0.01 0.00 0.00 0.00 0.00 0.01 0.02 0.00 0.08 0.08 0.07 0.08 0.05 0.06 0.03 0.04 T1 17.44 16.26 15.91 16.23 16.32 15.32 15.24 15.05 15.20 15.22 15.79 15.80 15.71 15.76 15.80 14.76 14.75 14.76 14.76 14.73 T2 2.73 2.68 2.64 2.71 2.76 2.80 2.78 2.70 2.83 2.78 2.32 2.21 2.15 2.20 2.18 2.50 2.39 2.42 2.44 2.42 T3 0.69 0.73 0.68 0.66 0.46 0.57 0.50 0.51 0.36 0.40 0.40 0.35 0.36 0.30 0.35 0.36

α=.5

T4 0.10 0.13 0.11 0.13 0.18 0.16 0.17 0.16 0.27 0.25 0.23 0.29 0.33 0.38 0.35 0.36 T1 14.08 13.12 12.94 13.06 13.21 12.50 12.36 12.26 12.32 12.35 12.79 12.89 12.81 12.86 12.85 12.33 12.47 12.50 12.48 12.46 T2 2.46 2.40 2.25 2.41 2.45 2.38 2.14 2.24 2.25 2.26 2.47 2.36 2.28 2.29 2.31 2.05 2.06 2.10 2.07 2.05 T3 0.69 0.73 0.67 0.68 0.55 0.57 0.54 0.54 0.45 0.43 0.44 0.43 0.41 0.39 0.40 0.41

α=.7

T4 0.02 0.04 0.04 0.02 1.00 0.03 0.04 0.04 0.05 0.03 0.07 0.04 0.03 0.06 0.06 0.06 0.06 T1 6.22 5.18 5.00 5.11 5.21 5.12 4.92 4.90 4.91 4.92 4.96 4.87 4.88 4.87 4.88 5.01 4.91 4.90 4.91 4.91 T2 1.53 1.36 1.13 1.29 1.37 1.01 0.91 0.89 0.89 0.92 1.00 0.83 0.84 0.84 0.84 0.96 0.84 0.84 0.85 0.85 T3 0.73 0.77 0.74 0.73 0.45 0.48 0.47 0.44 0.36 0.34 0.35 0.34 0.32 0.32 0.32 0.32

α=.9

T4 0.02 0.03 0.02 0.02 0.14 0.12 0.12 0.15 0.11 0.13 0.11 0.12 0.20 0.22 0.20 0.22 Quantile

T1 26.34 22.24 19.98 20.76 21.29 20.01 18.74 18.50 18.61 18.70 18.52 18.02 17.94 17.98 18.02 17.90 17.84 17.83 17.84 17.84 T2 4.89 4.54 3.05 3.32 3.77 2.18 1.85 1.69 1.75 1.82 2.11 1.95 1.89 1.89 1.95 1.76 1.72 1.71 1.72 1.72 T3 0.86 0.99 0.94 0.92 0.86 0.93 0.90 0.87 0.75 0.78 0.76 0.75 0.50 0.50 0.50 0.50

α=.1

T4 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 T1 42.30 38.84 37.17 37.77 38.31 37.52 36.62 36.29 36.46 36.58 35.85 35.60 35.49 35.56 35.60 35.29 35.26 35.21 35.24 35.26 T2 4.41 4.19 3.59 3.67 3.87 3.22 3.11 2.97 3.03 3.10 3.36 3.28 3.22 3.26 3.28 2.92 3.00 2.97 2.99 3.00 T3 0.90 0.99 0.97 0.94 0.82 0.85 0.84 0.83 0.60 0.67 0.60 0.60 0.57 0.57 0.56 0.57

α=.3

T4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 T1 47.65 43.98 42.47 43.06 43.60 42.35 41.33 41.02 41.17 41.29 40.76 40.48 40.34 40.43 40.48 40.23 40.18 40.13 40.16 40.17 T2 4.56 4.54 3.82 4.00 4.22 3.27 3.23 3.06 3.13 3.21 3.52 3.48 3.45 3.47 3.47 3.11 3.15 3.13 3.15 3.15 T3 0.87 0.99 0.95 0.89 0.82 0.91 0.87 0.85 0.63 0.73 0.67 0.63 0.58 0.62 0.61 0.58

α=.5

T4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 T1 42.57 39.18 37.46 38.08 38.68 36.74 35.99 35.68 35.83 35.96 35.43 35.26 35.15 35.21 35.25 34.93 34.92 34.88 34.90 34.92 T2 4.10 4.48 3.79 3.86 4.12 2.85 2.76 2.63 2.66 2.74 3.24 3.11 3.08 3.09 3.11 2.65 2.63 2.62 2.63 2.63 T3 0.84 0.98 0.95 0.90 0.68 0.77 0.73 0.71 0.68 0.71 0.70 0.68 0.55 0.58 0.56 0.55

α=.7

T4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 T1 25.92 21.52 19.70 20.24 20.74 19.61 18.36 18.16 18.26 18.33 18.30 17.87 17.83 17.85 17.86 17.80 17.71 17.69 17.70 17.70 T2 4.29 4.07 2.83 3.04 3.42 2.26 1.89 1.77 1.83 1.87 2.11 1.88 1.88 1.88 1.88 1.60 1.43 1.42 1.43 1.43 T3 0.85 0.98 0.98 0.93 0.85 0.89 0.89 0.86 0.74 0.76 0.73 0.74 0.55 0.56 0.55 0.55

α=.9

T4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

Page 32: Bagging Binary Predictors for Time Series · As noticed by many researchers bagging may outperform the unbagged predictors, but this may not always be the case. Bagging is a device

Table 4. AR(1)-ARCH(0)- MW2(Skewed Unimodal): 11 6.0 ++ += ttt uyy

R=20 R =50 R =100 R =200 J=1 J =50 J =1 J =50 J =1 J =50 J =1 J =50

Binary EW W1 W5 WR EW W1 W5 WR EW W1 W5 WR EW W1 W5 WR T1 7.36 6.00 5.71 5.82 5.88 5.57 5.14 5.13 5.13 5.14 5.51 5.29 5.29 5.29 5.29 5.28 5.27 5.27 5.27 5.27 T2 1.77 1.69 1.27 1.45 1.53 1.12 0.84 0.83 0.83 0.85 1.12 0.97 0.97 0.97 0.97 0.95 0.88 0.88 0.88 0.88 T3 0.78 0.83 0.82 0.81 0.60 0.60 0.60 0.60 0.42 0.42 0.42 0.41 0.24 0.24 0.24 0.24

α=.1

T4 0.04 0.04 0.03 0.03 0.09 0.08 0.08 0.09 0.25 0.23 0.25 0.24 0.33 0.32 0.32 0.32 T1 15.14 14.50 14.05 14.32 14.50 13.68 13.48 13.43 13.43 13.46 13.16 13.17 13.10 13.18 13.19 12.88 12.98 12.95 12.98 12.98 T2 2.57 2.71 2.53 2.52 2.70 2.41 2.20 2.25 2.18 2.18 2.34 2.25 2.30 2.24 2.24 2.34 2.46 2.47 2.45 2.47 T3 0.66 0.68 0.66 0.66 0.52 0.55 0.54 0.52 0.51 0.54 0.51 0.48 0.48 0.49 0.48 0.48

α=.3

T4 0.01 0.02 0.01 0.01 0.03 0.05 0.05 0.03 0.03 0.03 0.03 0.03 0.02 0.03 0.02 0.02 T1 16.61 16.01 15.50 15.79 15.98 15.33 15.17 15.07 15.21 15.21 14.62 14.71 14.65 14.66 14.70 14.24 14.23 14.22 14.22 14.22 T2 2.84 2.55 2.36 2.45 2.48 3.07 2.83 2.80 2.86 2.83 3.00 3.16 3.06 3.14 3.21 2.46 2.41 2.41 2.41 2.43 T3 0.56 0.65 0.61 0.60 0.48 0.51 0.46 0.44 0.43 0.41 0.40 0.38 0.31 0.30 0.34 0.32

α=.5

T4 0.08 0.10 0.07 0.04 0.15 0.19 0.17 0.20 0.17 0.22 0.20 0.19 0.40 0.44 0.38 0.40 T1 12.80 12.04 11.85 11.96 12.09 11.41 11.40 11.40 11.37 11.38 10.83 10.85 10.83 10.80 10.83 11.20 11.31 11.27 11.28 11.29 T2 2.43 2.24 2.17 2.15 2.27 2.21 2.14 2.18 2.16 2.16 2.33 2.11 2.08 2.08 2.09 2.20 2.23 2.22 2.22 2.21 T3 0.64 0.71 0.67 0.63 0.50 0.46 0.48 0.50 0.50 0.52 0.52 0.49 0.46 0.45 0.46 0.44

α=.7

T4 0.01 0.00 0.02 0.01 0.04 0.07 0.06 0.05 0.05 0.06 0.06 0.06 0.04 0.06 0.06 0.05 T1 5.46 4.79 4.69 4.76 4.85 4.50 4.48 4.47 4.47 4.48 4.29 4.41 4.40 4.40 4.40 4.35 4.46 4.43 4.44 4.44 T2 1.62 1.46 1.33 1.42 1.48 1.11 0.94 0.96 0.96 0.96 0.94 0.92 0.92 0.91 0.93 0.93 0.79 0.80 0.80 0.80 T3 0.64 0.70 0.65 0.63 0.39 0.37 0.38 0.37 0.26 0.25 0.26 0.25 0.23 0.25 0.23 0.24

α=.9

T4 0.05 0.02 0.03 0.04 0.05 0.07 0.07 0.07 0.07 0.07 0.07 0.07 0.09 0.08 0.09 0.09 Quantile

T1 30.66 24.13 21.78 22.73 23.25 23.46 21.41 21.13 21.29 21.39 21.57 20.80 20.74 20.76 20.80 20.87 20.59 20.57 20.57 20.59 T2 5.98 5.05 3.56 3.92 4.26 3.20 2.64 2.55 2.58 2.62 3.09 2.71 2.66 2.67 2.71 2.32 2.23 2.22 2.22 2.23 T3 0.87 1.00 0.97 0.95 0.90 0.97 0.92 0.91 0.77 0.79 0.77 0.77 0.67 0.68 0.69 0.67

α=.1

T4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 T1 43.96 40.60 38.92 39.29 39.95 38.52 37.42 37.11 37.25 37.40 36.49 36.21 36.07 36.14 36.20 36.01 35.92 35.87 35.89 35.91 T2 5.35 5.34 4.66 4.47 4.79 4.21 3.85 3.92 3.85 3.86 4.26 4.13 4.06 4.09 4.13 3.23 3.22 3.22 3.22 3.22 T3 0.82 0.97 0.91 0.89 0.79 0.89 0.84 0.79 0.62 0.67 0.64 0.62 0.64 0.67 0.67 0.64

α=.3

T4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 T1 45.58 42.49 41.09 41.54 42.09 40.89 40.09 39.73 39.91 40.05 39.02 39.04 38.90 38.98 39.03 38.36 38.31 38.27 38.29 38.31 T2 5.07 4.85 4.22 4.30 4.55 3.86 3.56 3.55 3.54 3.55 4.01 3.98 3.91 3.94 3.97 3.31 3.23 3.23 3.23 3.23 T3 0.79 0.93 0.89 0.87 0.75 0.91 0.81 0.75 0.53 0.65 0.58 0.54 0.52 0.60 0.55 0.54

α=.5

T4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 T1 38.32 35.59 34.23 34.63 35.17 33.77 33.11 32.77 32.96 33.08 32.35 32.25 32.16 32.21 32.25 31.69 31.74 31.70 31.73 31.74 T2 4.55 4.14 3.60 3.54 3.84 2.72 2.65 2.54 2.63 2.65 2.97 2.91 2.89 2.90 2.91 2.58 2.52 2.52 2.52 2.52 T3 0.82 0.96 0.91 0.84 0.70 0.83 0.78 0.70 0.56 0.62 0.61 0.56 0.50 0.51 0.49 0.50

α=.7

T4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 T1 22.62 19.42 17.79 18.10 18.62 16.93 16.17 15.93 16.05 16.14 15.78 15.66 15.63 15.65 15.66 15.32 15.46 15.45 15.46 15.46 T2 4.49 3.66 2.74 2.79 3.12 1.96 1.62 1.46 1.53 1.60 1.54 1.53 1.52 1.52 1.53 1.35 1.27 1.26 1.27 1.27 T3 0.77 0.93 0.91 0.87 0.71 0.82 0.76 0.71 0.54 0.54 0.55 0.54 0.45 0.45 0.45 0.45

α=.9

T4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

Page 33: Bagging Binary Predictors for Time Series · As noticed by many researchers bagging may outperform the unbagged predictors, but this may not always be the case. Bagging is a device

Table 5. AR(1)-ARCH(0)-MW3(Strongly Skewed): 11 6.0 ++ += ttt uyy

R=20 R =50 R =100 R =200 J=1 J =50 J =1 J =50 J =1 J =50 J =1 J =50

Binary EW W1 W5 WR EW W1 W5 WR EW W1 W5 WR EW W1 W5 WR T1 3.32 3.38 3.30 3.32 3.33 2.97 3.26 3.28 3.26 3.26 3.12 3.43 3.45 3.43 3.43 3.00 3.23 3.25 3.23 3.23 T2 0.81 0.71 0.61 0.61 0.64 0.63 0.56 0.56 0.56 0.56 0.68 0.72 0.72 0.72 0.72 0.66 0.70 0.70 0.70 0.70 T3 0.34 0.36 0.37 0.37 0.17 0.18 0.17 0.17 0.13 0.13 0.13 0.13 0.06 0.06 0.06 0.06

α=.1

T4 0.06 0.06 0.06 0.07 0.08 0.05 0.08 0.08 0.08 0.06 0.08 0.08 0.11 0.10 0.11 0.11 T1 9.46 9.34 9.04 9.23 9.28 8.35 8.30 8.29 8.31 8.30 8.62 8.74 8.74 8.74 8.74 8.57 8.60 8.60 8.60 8.61 T2 2.03 2.11 1.83 1.94 1.99 1.55 1.58 1.56 1.58 1.58 1.59 1.60 1.61 1.59 1.61 1.70 1.78 1.78 1.78 1.79 T3 0.52 0.60 0.54 0.56 0.32 0.31 0.31 0.32 0.30 0.30 0.31 0.31 0.34 0.34 0.34 0.34

α=.3

T4 0.06 0.07 0.07 0.04 0.34 0.34 0.31 0.34 0.23 0.23 0.22 0.22 0.33 0.33 0.33 0.31 T1 14.59 14.09 13.64 13.85 14.00 12.85 12.72 12.61 12.66 12.70 12.99 13.06 13.02 13.04 13.02 13.13 13.01 13.04 13.01 13.00 T2 2.68 3.03 2.82 2.79 2.96 2.38 2.28 2.25 2.24 2.27 2.36 2.38 2.30 2.35 2.33 2.61 2.55 2.54 2.55 2.56 T3 0.57 0.67 0.66 0.57 0.45 0.52 0.51 0.45 0.32 0.32 0.30 0.31 0.39 0.41 0.39 0.39

α=.5

T4 0.12 0.14 0.06 0.14 0.26 0.17 0.20 0.26 0.34 0.38 0.35 0.37 0.38 0.35 0.36 0.37 T1 16.12 14.44 14.16 14.39 14.52 14.83 14.58 14.37 14.45 14.46 14.25 14.47 14.32 14.28 14.43 14.12 14.18 14.11 14.10 14.18 T2 2.62 2.61 2.48 2.53 2.56 2.65 2.37 2.28 2.26 2.36 2.32 2.28 2.22 2.20 2.21 2.38 2.43 2.35 2.38 2.42 T3 0.76 0.82 0.80 0.76 0.56 0.62 0.59 0.61 0.42 0.49 0.49 0.43 0.40 0.42 0.40 0.37

α=.7

T4 0.00 0.00 0.01 0.01 0.00 0.03 0.02 0.02 0.03 0.03 0.03 0.04 0.07 0.08 0.08 0.08 T1 8.80 6.88 6.86 6.90 6.99 6.45 5.98 5.98 5.98 5.99 5.90 5.72 5.72 5.71 5.72 5.83 5.79 5.78 5.78 5.78 T2 1.94 1.85 1.69 1.79 1.87 1.18 0.89 0.91 0.91 0.91 1.05 0.90 0.90 0.90 0.90 1.13 0.99 0.98 0.98 0.98 T3 0.85 0.84 0.85 0.84 0.51 0.52 0.52 0.52 0.30 0.29 0.30 0.30 0.14 0.14 0.14 0.14

α=.9

T4 0.01 0.04 0.02 0.02 0.16 0.18 0.19 0.17 0.40 0.40 0.40 0.40 0.66 0.66 0.66 0.66 Quantile

T1 12.96 13.34 11.83 12.13 12.30 9.52 10.31 10.23 10.26 10.28 9.39 10.16 10.12 10.14 10.15 9.19 9.74 9.72 9.73 9.74 T2 2.70 2.96 1.66 1.72 1.79 1.01 1.14 1.12 1.13 1.13 1.07 1.35 1.33 1.33 1.34 0.98 1.18 1.18 1.18 1.18 T3 0.46 0.66 0.62 0.61 0.05 0.08 0.06 0.06 0.07 0.08 0.08 0.08 0.03 0.03 0.03 0.03

α=.1

T4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 T1 30.27 29.96 28.27 28.65 29.10 25.85 25.89 25.71 25.80 25.86 25.50 25.65 25.58 25.62 25.64 25.07 25.16 25.14 25.15 25.15 T2 4.04 4.34 3.46 3.31 3.54 2.47 2.54 2.47 2.51 2.53 2.90 2.94 2.91 2.92 2.93 2.68 2.65 2.65 2.65 2.65 T3 0.61 0.76 0.72 0.69 0.49 0.56 0.51 0.51 0.41 0.45 0.44 0.42 0.41 0.46 0.42 0.42

α=.3

T4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 T1 43.05 40.51 39.27 39.58 40.01 37.70 37.19 36.97 37.06 37.15 37.07 36.96 36.88 36.92 36.96 36.25 36.29 36.25 36.27 36.29 T2 5.17 5.09 4.67 4.64 4.84 3.82 3.74 3.65 3.71 3.73 4.23 4.05 4.05 4.05 4.05 3.90 3.72 3.72 3.71 3.72 T3 0.79 0.89 0.84 0.82 0.64 0.76 0.68 0.65 0.56 0.64 0.58 0.57 0.49 0.51 0.49 0.49

α=.5

T4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 T1 47.22 42.05 40.95 41.22 41.61 40.95 39.29 39.18 39.17 39.27 39.69 39.29 39.15 39.22 39.29 38.56 38.37 38.34 38.35 38.37 T2 5.69 5.70 5.34 5.27 5.44 4.65 4.52 4.47 4.45 4.51 4.72 4.62 4.55 4.59 4.62 3.99 3.77 3.78 3.77 3.77 T3 0.91 0.98 0.96 0.95 0.91 0.94 0.93 0.92 0.68 0.75 0.71 0.68 0.63 0.63 0.63 0.64

α=.7

T4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 T1 35.58 27.37 25.20 25.70 26.31 26.28 23.70 23.47 23.59 23.70 24.36 23.09 22.98 23.00 23.08 23.36 22.87 22.84 22.87 22.87 T2 6.64 5.74 4.33 4.11 4.47 3.92 3.23 3.09 3.12 3.21 3.10 2.84 2.75 2.76 2.83 2.44 2.18 2.17 2.18 2.18 T3 0.91 0.98 0.97 0.96 0.94 0.99 0.97 0.96 0.89 0.94 0.90 0.89 0.80 0.81 0.80 0.80

α=.9

T4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

Page 34: Bagging Binary Predictors for Time Series · As noticed by many researchers bagging may outperform the unbagged predictors, but this may not always be the case. Bagging is a device

Table 6. AR(1)-ARCH(0)-MW4(Kurtoic Unimodal): 11 6.0 ++ += ttt uyy

R=20 R =50 R =100 R =200 J=1 J =50 J =1 J =50 J =1 J =50 J =1 J =50

Binary EW W1 W5 WR EW W1 W5 WR EW W1 W5 WR EW W1 W5 WR T1 6.51 5.46 5.12 5.24 5.31 5.26 4.74 4.77 4.75 4.75 4.98 4.83 4.84 4.83 4.84 4.90 4.95 4.95 4.95 4.95 T2 1.58 1.41 1.14 1.19 1.26 1.37 0.97 0.99 0.97 0.97 1.10 0.92 0.90 0.92 0.92 1.08 0.96 0.96 0.96 0.96 T3 0.72 0.79 0.75 0.77 0.58 0.57 0.58 0.58 0.38 0.36 0.38 0.37 0.22 0.22 0.22 0.22

α=.1

T4 0.04 0.04 0.06 0.03 0.11 0.11 0.11 0.11 0.18 0.18 0.18 0.18 0.16 0.16 0.16 0.16 T1 13.02 12.27 12.09 12.18 12.32 11.55 11.53 11.48 11.51 11.55 11.50 11.58 11.52 11.56 11.59 11.34 11.31 11.31 11.31 11.30 T2 2.23 2.11 1.92 2.09 2.13 2.23 2.25 2.21 2.21 2.25 2.15 2.06 2.06 2.05 2.06 2.13 2.22 2.22 2.23 2.21 T3 0.65 0.71 0.71 0.65 0.49 0.46 0.46 0.46 0.47 0.45 0.46 0.46 0.48 0.48 0.48 0.47

α=.3

T4 0.01 0.01 0.01 0.00 0.00 0.03 0.01 0.00 0.02 0.02 0.02 0.02 0.04 0.04 0.05 0.04 T1 14.51 14.33 13.97 14.19 14.32 12.61 12.78 12.58 12.64 12.74 12.64 12.69 12.66 12.62 12.70 12.30 12.27 12.28 12.27 12.28 T2 2.51 2.64 2.55 2.62 2.66 2.07 2.18 2.06 2.12 2.17 2.36 2.37 2.33 2.37 2.37 2.45 2.46 2.48 2.47 2.48 T3 0.49 0.60 0.53 0.49 0.29 0.38 0.35 0.32 0.30 0.26 0.31 0.27 0.25 0.24 0.25 0.25

α=.5

T4 0.08 0.09 0.06 0.07 0.30 0.24 0.27 0.25 0.36 0.47 0.43 0.42 0.61 0.58 0.61 0.59 T1 12.91 11.97 11.68 11.88 11.96 11.54 11.64 11.54 11.55 11.56 11.47 11.56 11.55 11.55 11.56 11.22 11.22 11.24 11.23 11.22 T2 2.37 2.38 2.18 2.36 2.42 2.14 1.98 1.95 1.96 1.97 2.17 2.09 2.12 2.08 2.11 2.33 2.25 2.25 2.27 2.28 T3 0.69 0.73 0.69 0.69 0.50 0.49 0.53 0.56 0.49 0.48 0.46 0.46 0.43 0.40 0.41 0.41

α=.7

T4 0.01 0.00 0.01 0.00 0.00 0.01 0.00 0.00 0.02 0.03 0.03 0.02 0.04 0.04 0.05 0.06 T1 6.55 5.24 5.24 5.26 5.33 5.21 4.89 4.89 4.89 4.89 5.00 4.94 4.94 4.95 4.94 4.83 4.89 4.88 4.89 4.89 T2 1.65 1.42 1.43 1.44 1.47 1.20 1.01 1.03 1.02 1.02 1.09 0.89 0.90 0.90 0.90 1.01 0.97 0.97 0.97 0.97 T3 0.81 0.79 0.80 0.80 0.55 0.55 0.55 0.54 0.33 0.31 0.31 0.31 0.21 0.22 0.21 0.22

α=.9

T4 0.02 0.02 0.02 0.02 0.04 0.06 0.06 0.04 0.11 0.15 0.15 0.15 0.22 0.21 0.22 0.21 Quantile

T1 29.60 23.20 20.80 21.53 22.14 22.02 19.98 19.79 19.88 19.97 20.06 19.35 19.26 19.31 19.34 19.26 19.14 19.13 19.14 19.14 T2 6.16 5.30 3.57 3.87 4.28 2.91 2.50 2.35 2.42 2.47 2.60 2.36 2.22 2.29 2.35 2.37 2.19 2.18 2.19 2.19 T3 0.85 0.99 0.98 0.95 0.90 0.93 0.90 0.90 0.82 0.83 0.83 0.83 0.59 0.56 0.59 0.59

α=.1

T4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 T1 39.25 37.30 35.50 36.00 36.58 34.56 33.95 33.74 33.81 33.92 32.98 33.09 33.01 33.04 33.08 32.53 32.75 32.72 32.73 32.74 T2 4.76 5.00 4.43 4.39 4.63 4.46 4.11 4.04 4.06 4.11 3.93 3.68 3.66 3.68 3.68 3.80 3.74 3.72 3.73 3.74 T3 0.72 0.92 0.85 0.76 0.61 0.67 0.64 0.62 0.51 0.51 0.52 0.51 0.39 0.40 0.39 0.39

α=.3

T4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 T1 40.39 40.21 38.38 38.94 39.50 35.15 35.75 35.39 35.57 35.69 34.41 34.70 34.62 34.66 34.69 34.14 34.29 34.28 34.29 34.29 T2 4.58 5.14 4.57 4.62 4.88 3.93 4.08 3.91 3.97 4.05 3.87 3.93 3.91 3.92 3.93 3.66 3.72 3.71 3.72 3.72 T3 0.52 0.79 0.73 0.68 0.27 0.43 0.33 0.29 0.32 0.36 0.33 0.32 0.37 0.40 0.38 0.37

α=.5

T4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 T1 39.79 36.77 35.32 35.79 36.27 33.55 33.39 33.13 33.23 33.35 32.94 32.92 32.82 32.87 32.91 32.55 32.74 32.72 32.73 32.74 T2 5.66 5.01 4.58 4.66 4.82 4.16 4.15 4.04 4.07 4.13 4.14 3.91 3.87 3.90 3.91 3.56 3.42 3.42 3.42 3.42 T3 0.84 0.93 0.92 0.90 0.51 0.57 0.53 0.51 0.50 0.53 0.51 0.50 0.39 0.39 0.39 0.39

α=.7

T4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 T1 29.40 23.19 21.07 21.66 22.21 21.69 19.62 19.47 19.52 19.61 19.97 19.22 19.16 19.18 19.22 19.31 19.11 19.08 19.10 19.11 T2 5.55 5.68 3.84 4.14 4.56 3.45 2.69 2.60 2.62 2.67 2.63 2.51 2.43 2.47 2.51 2.29 2.05 2.01 2.05 2.05 T3 0.92 0.99 0.97 0.95 0.92 0.93 0.93 0.92 0.77 0.81 0.80 0.77 0.58 0.60 0.58 0.58

α=.9

T4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

Page 35: Bagging Binary Predictors for Time Series · As noticed by many researchers bagging may outperform the unbagged predictors, but this may not always be the case. Bagging is a device

Table 7. AR(1)-ARCH(0)-MW5(Outlier): 11 6.0 ++ += ttt uyy

R=20 R =50 R =100 R =200 J=1 J =50 J =1 J =50 J =1 J =50 J =1 J =50

Binary EW W1 W5 WR EW W1 W5 WR EW W1 W5 WR EW W1 W5 WR T1 5.97 5.14 4.91 5.00 5.06 4.83 4.63 4.63 4.63 4.63 4.50 4.64 4.66 4.65 4.65 4.38 4.64 4.65 4.65 4.65 T2 1.43 1.43 1.23 1.34 1.41 1.28 0.92 0.92 0.92 0.93 1.05 0.91 0.90 0.91 0.91 1.09 0.93 0.94 0.94 0.94 T3 0.74 0.79 0.78 0.76 0.44 0.44 0.44 0.44 0.29 0.27 0.29 0.28 0.22 0.22 0.22 0.22

α=.1

T4 0.00 0.01 0.01 0.01 0.04 0.04 0.04 0.04 0.05 0.04 0.04 0.05 0.05 0.04 0.04 0.04 T1 12.29 11.85 11.53 11.72 11.85 10.93 10.83 10.78 10.82 10.81 10.61 10.55 10.53 10.51 10.56 10.43 10.50 10.48 10.48 10.51 T2 2.15 2.33 2.22 2.25 2.29 2.17 2.11 2.14 2.09 2.08 1.88 1.80 1.79 1.79 1.82 2.29 2.27 2.25 2.27 2.27 T3 0.60 0.63 0.65 0.60 0.54 0.52 0.52 0.54 0.46 0.47 0.48 0.45 0.41 0.42 0.41 0.41

α=.3

T4 0.03 0.04 0.01 0.05 0.01 0.05 0.03 0.01 0.05 0.06 0.05 0.05 0.05 0.06 0.05 0.04 T1 14.79 14.24 13.90 14.13 14.29 12.83 12.85 12.78 12.86 12.84 13.13 13.08 13.03 13.05 13.08 12.09 12.16 12.16 12.18 12.15 T2 2.78 2.71 2.65 2.75 2.76 2.72 2.62 2.63 2.62 2.61 2.56 2.57 2.54 2.49 2.55 2.65 2.70 2.68 2.70 2.68 T3 0.53 0.63 0.59 0.53 0.39 0.41 0.40 0.40 0.36 0.34 0.33 0.35 0.21 0.20 0.19 0.20

α=.5

T4 0.18 0.15 0.11 0.12 0.17 0.20 0.18 0.18 0.38 0.42 0.43 0.37 0.51 0.51 0.53 0.53 T1 12.40 11.98 11.62 11.84 11.99 10.48 10.56 10.49 10.55 10.57 10.82 10.77 10.80 10.75 10.80 10.17 10.12 10.11 10.12 10.12 T2 2.46 2.45 2.30 2.37 2.46 2.13 2.01 2.02 2.00 2.05 2.16 2.01 2.03 2.00 2.01 2.16 2.09 2.10 2.12 2.10 T3 0.61 0.67 0.62 0.59 0.43 0.45 0.42 0.42 0.42 0.41 0.45 0.41 0.43 0.44 0.43 0.43

α=.7

T4 0.02 0.03 0.03 0.01 0.04 0.05 0.06 0.05 0.05 0.08 0.07 0.05 0.11 0.11 0.11 0.13 T1 6.29 5.44 5.17 5.35 5.44 4.62 4.65 4.61 4.63 4.64 4.44 4.57 4.54 4.56 4.55 4.31 4.54 4.50 4.51 4.51 T2 1.61 1.72 1.42 1.63 1.68 1.13 0.97 0.94 0.95 0.96 0.90 0.90 0.90 0.90 0.90 0.95 0.91 0.90 0.90 0.90 T3 0.68 0.79 0.73 0.68 0.40 0.40 0.40 0.39 0.28 0.28 0.28 0.28 0.24 0.26 0.25 0.26

α=.9

T4 0.04 0.02 0.03 0.04 0.06 0.06 0.06 0.06 0.04 0.05 0.04 0.04 0.02 0.02 0.03 0.02 Quantile

T1 60.94 30.47 26.77 26.63 26.58 19.81 19.32 18.52 18.65 18.93 16.83 17.09 17.05 17.05 17.07 14.82 15.20 15.19 15.19 15.19 T2 197.22 19.90 39.66 23.42 15.44 7.80 7.04 6.28 6.34 6.65 6.24 6.02 6.05 6.01 6.01 4.98 4.72 4.73 4.72 4.72 T3 0.84 0.95 0.94 0.92 0.49 0.58 0.54 0.55 0.37 0.38 0.40 0.38 0.29 0.29 0.29 0.29

α=.1

T4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 T1 33.53 31.55 30.28 28.57 29.27 23.56 23.92 23.39 23.47 23.74 22.24 22.56 22.44 22.49 22.54 20.74 21.02 20.98 21.00 21.01 T2 16.09 11.27 12.83 8.54 8.81 4.98 5.39 4.92 4.96 5.22 5.17 5.18 5.15 5.16 5.18 4.30 4.38 4.38 4.38 4.38 T3 0.52 0.68 0.71 0.62 0.34 0.48 0.44 0.40 0.36 0.37 0.37 0.36 0.37 0.40 0.38 0.37

α=.3

T4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 T1 33.68 31.94 31.30 29.47 30.10 24.76 25.04 24.55 24.69 24.91 23.62 24.01 23.85 23.92 23.99 22.41 22.57 22.53 22.55 22.56 T2 13.75 11.28 13.54 9.06 9.31 4.80 4.93 4.70 4.71 4.82 4.76 4.94 4.85 4.90 4.93 3.94 3.99 3.98 3.98 3.98 T3 0.61 0.76 0.75 0.69 0.36 0.48 0.42 0.38 0.34 0.40 0.35 0.35 0.36 0.39 0.38 0.36

α=.5

T4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 T1 34.17 32.33 31.50 29.14 29.77 23.22 23.08 23.06 22.87 22.94 21.72 22.12 21.94 22.02 22.10 20.68 20.99 20.95 20.97 20.99 T2 14.56 12.73 18.06 9.18 9.59 6.52 5.29 6.15 5.22 5.19 4.81 4.99 4.86 4.92 4.98 4.06 4.12 4.10 4.11 4.11 T3 0.57 0.72 0.71 0.64 0.36 0.42 0.42 0.38 0.29 0.36 0.33 0.29 0.31 0.34 0.33 0.32

α=.7

T4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 T1 44.66 31.48 26.43 26.94 27.44 19.29 18.08 17.49 17.67 17.76 16.15 16.30 16.18 16.19 16.27 14.90 15.35 15.37 15.34 15.35 T2 31.82 16.88 13.36 12.67 12.49 8.66 7.00 6.46 6.59 6.76 5.89 5.25 5.16 5.15 5.24 4.60 4.33 4.40 4.34 4.33 T3 0.80 0.89 0.87 0.91 0.61 0.68 0.65 0.63 0.37 0.40 0.41 0.38 0.26 0.25 0.27 0.26

α=.9

T4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

Page 36: Bagging Binary Predictors for Time Series · As noticed by many researchers bagging may outperform the unbagged predictors, but this may not always be the case. Bagging is a device

Table 8. AR(1)-ARCH(0)-MW6(Bimodal): 11 6.0 ++ += ttt uyy

R=20 R =50 R =100 R =200 J=1 J =50 J =1 J =50 J =1 J =50 J =1 J =50

Binary EW W1 W5 WR EW W1 W5 WR EW W1 W5 WR EW W1 W5 WR T1 6.09 5.41 5.20 5.22 5.36 5.11 4.80 4.82 4.80 4.80 4.78 4.72 4.73 4.72 4.73 4.91 4.89 4.89 4.89 4.89 T2 1.69 1.79 1.42 1.50 1.70 1.01 0.78 0.81 0.78 0.78 0.86 0.73 0.74 0.73 0.74 0.82 0.75 0.75 0.75 0.75 T3 0.66 0.72 0.72 0.69 0.48 0.46 0.47 0.46 0.32 0.32 0.32 0.32 0.26 0.26 0.26 0.26

α=.1

T4 0.08 0.09 0.08 0.06 0.15 0.14 0.15 0.16 0.17 0.17 0.17 0.17 0.17 0.17 0.17 0.17 T1 14.68 13.74 13.46 13.71 13.75 13.24 12.99 12.98 12.96 12.96 12.79 12.73 12.72 12.73 12.73 13.03 13.04 13.06 13.05 13.07 T2 3.05 2.69 2.53 2.63 2.68 2.25 2.14 2.11 2.11 2.12 2.31 2.21 2.20 2.20 2.22 2.33 2.15 2.14 2.17 2.15 T3 0.73 0.77 0.71 0.70 0.57 0.58 0.57 0.58 0.47 0.47 0.47 0.47 0.44 0.45 0.45 0.43

α=.3

T4 0.01 0.01 0.00 0.02 0.02 0.02 0.01 0.02 0.06 0.07 0.06 0.06 0.04 0.04 0.04 0.04 T1 18.53 17.42 16.98 17.25 17.36 17.36 17.02 16.83 16.95 17.03 16.85 16.81 16.75 16.82 16.76 16.57 16.56 16.56 16.57 16.57 T2 2.70 2.91 2.65 2.84 2.94 2.93 2.77 2.70 2.68 2.76 2.83 2.73 2.66 2.66 2.71 2.67 2.66 2.65 2.68 2.65 T3 0.62 0.79 0.68 0.67 0.51 0.57 0.50 0.49 0.41 0.49 0.40 0.44 0.35 0.36 0.34 0.35

α=.5

T4 0.12 0.08 0.06 0.09 0.13 0.13 0.18 0.14 0.26 0.21 0.27 0.23 0.31 0.28 0.30 0.29 T1 14.96 13.82 13.51 13.66 13.87 13.40 13.01 13.06 13.02 13.02 13.11 13.05 13.01 13.05 13.02 12.98 12.95 12.94 12.96 12.93 T2 2.69 2.61 2.49 2.55 2.67 2.61 2.17 2.28 2.23 2.23 1.94 1.90 1.86 1.92 1.88 2.11 1.99 1.98 1.98 1.98 T3 0.74 0.78 0.73 0.71 0.61 0.61 0.58 0.61 0.50 0.51 0.50 0.51 0.44 0.44 0.45 0.44

α=.7

T4 0.00 0.03 0.02 0.01 0.04 0.03 0.05 0.04 0.02 0.03 0.06 0.06 0.05 0.05 0.05 0.05 T1 6.12 5.38 5.16 5.26 5.40 5.09 4.93 4.91 4.92 4.92 5.11 4.99 4.99 4.99 4.99 4.85 4.89 4.89 4.90 4.89 T2 1.56 1.43 1.22 1.30 1.46 1.07 0.87 0.87 0.87 0.88 0.89 0.79 0.80 0.80 0.80 0.81 0.74 0.74 0.74 0.74 T3 0.63 0.70 0.66 0.62 0.41 0.41 0.41 0.41 0.38 0.37 0.37 0.37 0.20 0.19 0.19 0.19

α=.9

T4 0.01 0.04 0.05 0.03 0.11 0.12 0.11 0.11 0.17 0.19 0.19 0.19 0.21 0.22 0.22 0.22 Quantile

T1 23.06 20.36 18.34 19.08 19.64 17.69 17.09 16.83 16.95 17.05 16.71 16.55 16.51 16.53 16.55 16.42 16.49 16.48 16.49 16.49 T2 2.83 3.42 2.03 2.35 2.85 1.64 1.45 1.32 1.37 1.43 1.29 1.19 1.18 1.17 1.19 1.25 1.24 1.22 1.23 1.24 T3 0.81 1.00 0.97 0.91 0.65 0.78 0.72 0.68 0.60 0.63 0.60 0.60 0.46 0.47 0.47 0.46

α=.1

T4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 T1 43.08 39.30 38.02 38.60 39.07 37.76 37.10 36.62 36.88 37.06 36.88 36.75 36.60 36.69 36.74 36.37 36.29 36.24 36.27 36.29 T2 3.75 3.56 3.15 3.21 3.41 2.66 2.56 2.41 2.49 2.54 2.39 2.47 2.39 2.43 2.47 2.46 2.53 2.51 2.52 2.53 T3 0.88 0.96 0.94 0.89 0.65 0.88 0.76 0.68 0.62 0.69 0.64 0.62 0.57 0.62 0.60 0.57

α=.3

T4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 T1 50.29 45.75 44.69 45.18 45.63 46.06 44.71 44.27 44.53 44.69 45.01 44.49 44.30 44.43 44.48 44.13 43.72 43.66 43.70 43.72 T2 3.80 3.96 3.70 3.66 3.83 2.97 2.95 2.78 2.87 2.95 3.01 3.22 3.15 3.19 3.21 3.01 3.17 3.12 3.15 3.17 T3 0.95 1.00 0.98 0.97 0.86 0.96 0.89 0.86 0.72 0.82 0.73 0.72 0.67 0.70 0.67 0.67

α=.5

T4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 T1 42.71 38.98 37.61 38.17 38.67 38.04 37.31 36.92 37.12 37.27 37.15 36.82 36.65 36.76 36.81 36.02 35.92 35.86 35.90 35.92 T2 3.34 3.51 2.70 2.96 3.28 2.67 2.45 2.29 2.32 2.43 2.53 2.42 2.38 2.41 2.42 2.55 2.70 2.63 2.67 2.69 T3 0.86 0.99 0.96 0.89 0.72 0.87 0.78 0.74 0.68 0.77 0.71 0.69 0.61 0.60 0.61 0.61

α=.7

T4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 T1 23.23 20.68 18.49 19.18 19.82 17.86 17.19 16.89 17.01 17.15 16.93 16.75 16.69 16.72 16.74 16.22 16.26 16.24 16.25 16.26 T2 3.15 3.88 2.10 2.44 3.06 1.71 1.65 1.39 1.49 1.61 1.36 1.31 1.30 1.30 1.31 1.34 1.28 1.27 1.27 1.28 T3 0.75 0.97 0.91 0.80 0.70 0.90 0.86 0.75 0.67 0.71 0.67 0.67 0.46 0.47 0.46 0.46

α=.9

T4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

Page 37: Bagging Binary Predictors for Time Series · As noticed by many researchers bagging may outperform the unbagged predictors, but this may not always be the case. Bagging is a device

Table 9. AR(1)-ARCH(0)-MW7(Separated Bimodal): 11 6.0 ++ += ttt uyy

R=20 R =50 R =100 R =200 J=1 J =50 J =1 J =50 J =1 J =50 J =1 J =50

Binary EW W1 W5 WR EW W1 W5 WR EW W1 W5 WR EW W1 W5 WR T1 5.51 5.15 5.00 5.06 5.12 4.97 4.78 4.77 4.77 4.78 4.76 4.78 4.78 4.78 4.77 4.84 4.83 4.83 4.84 4.84 T2 1.35 1.22 1.03 1.11 1.18 1.00 0.72 0.73 0.73 0.71 0.92 0.84 0.83 0.84 0.83 0.89 0.77 0.78 0.77 0.77 T3 0.57 0.61 0.58 0.56 0.39 0.39 0.39 0.39 0.25 0.25 0.25 0.26 0.25 0.26 0.25 0.25

α=.1

T4 0.07 0.08 0.08 0.07 0.12 0.11 0.12 0.11 0.20 0.19 0.20 0.20 0.19 0.19 0.19 0.19 T1 14.65 13.40 13.10 13.32 13.34 13.40 13.23 13.15 13.21 13.25 12.92 12.81 12.80 12.80 12.81 13.16 13.12 13.12 13.12 13.12 T2 2.66 2.65 2.37 2.51 2.55 2.29 2.05 2.04 2.04 2.06 2.27 2.18 2.18 2.14 2.15 2.26 2.26 2.25 2.24 2.25 T3 0.69 0.74 0.72 0.73 0.52 0.54 0.50 0.52 0.45 0.44 0.46 0.45 0.39 0.41 0.41 0.39

α=.3

T4 0.01 0.04 0.00 0.01 0.12 0.13 0.14 0.10 0.17 0.20 0.17 0.14 0.16 0.16 0.15 0.16 T1 20.69 18.36 17.94 18.39 18.50 20.03 18.92 18.83 19.00 18.97 18.49 18.03 17.96 17.94 18.01 18.74 18.64 18.57 18.58 18.62 T2 3.08 2.96 2.80 2.92 2.89 3.24 2.79 2.94 2.85 2.79 2.88 2.81 2.86 2.80 2.80 3.05 3.07 3.02 3.03 3.07 T3 0.82 0.83 0.81 0.80 0.72 0.76 0.69 0.69 0.55 0.59 0.55 0.57 0.50 0.45 0.46 0.49

α=.5

T4 0.04 0.09 0.05 0.04 0.08 0.09 0.11 0.12 0.21 0.20 0.24 0.19 0.19 0.21 0.25 0.16 T1 15.36 14.13 13.98 14.10 14.20 13.61 13.27 13.21 13.31 13.29 12.96 13.03 13.01 13.02 13.03 12.78 12.84 12.82 12.82 12.83 T2 2.49 2.46 2.29 2.38 2.46 1.98 1.77 1.76 1.86 1.83 2.26 2.14 2.12 2.12 2.14 2.23 2.22 2.24 2.23 2.22 T3 0.76 0.81 0.76 0.77 0.60 0.65 0.59 0.58 0.39 0.39 0.36 0.37 0.35 0.34 0.34 0.34

α=.7

T4 0.01 0.02 0.02 0.01 0.09 0.10 0.12 0.13 0.10 0.12 0.11 0.13 0.21 0.20 0.21 0.22 T1 5.73 5.32 5.27 5.26 5.35 4.92 4.82 4.84 4.84 4.84 4.89 4.88 4.86 4.87 4.87 4.75 4.78 4.77 4.77 4.77 T2 1.22 1.19 1.14 1.12 1.22 0.86 0.64 0.68 0.66 0.66 0.92 0.82 0.81 0.81 0.81 0.96 0.81 0.81 0.81 0.81 T3 0.59 0.61 0.60 0.58 0.33 0.31 0.31 0.31 0.26 0.26 0.26 0.26 0.20 0.20 0.20 0.20

α=.9

T4 0.04 0.06 0.05 0.05 0.15 0.17 0.16 0.16 0.18 0.19 0.18 0.18 0.19 0.19 0.19 0.19 Quantile

T1 18.15 17.67 15.95 16.59 17.01 14.96 15.16 14.89 15.01 15.12 14.20 14.39 14.35 14.37 14.38 14.04 14.17 14.16 14.17 14.17 T2 2.09 3.02 1.41 1.76 2.24 1.02 1.18 0.96 1.04 1.15 1.08 1.16 1.15 1.15 1.16 1.03 1.09 1.09 1.09 1.09 T3 0.64 0.91 0.78 0.70 0.43 0.52 0.48 0.46 0.32 0.38 0.32 0.33 0.39 0.40 0.39 0.39

α=.1

T4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 T1 41.60 38.29 37.03 37.68 38.02 36.87 36.69 36.14 36.41 36.63 35.23 35.43 35.25 35.35 35.41 34.88 35.01 34.94 34.98 35.00 T2 3.85 3.32 2.82 2.96 3.16 2.37 2.43 2.18 2.25 2.40 2.68 2.71 2.63 2.67 2.70 2.37 2.38 2.35 2.36 2.38 T3 0.83 0.94 0.86 0.85 0.58 0.79 0.63 0.58 0.42 0.48 0.42 0.42 0.48 0.54 0.49 0.48

α=.3

T4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 T1 53.50 46.87 46.05 46.62 46.89 50.09 47.14 46.80 47.05 47.17 48.55 47.13 46.95 47.06 47.13 48.11 47.32 47.25 47.30 47.33 T2 3.80 3.30 3.16 3.19 3.24 3.19 2.76 2.59 2.67 2.76 2.95 2.69 2.59 2.65 2.69 2.83 2.56 2.46 2.52 2.55 T3 0.99 1.00 1.00 0.99 0.93 1.00 0.99 0.94 0.87 0.88 0.86 0.87 0.70 0.72 0.70 0.70

α=.5

T4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 T1 41.79 38.95 37.62 38.27 38.70 36.76 36.55 36.00 36.32 36.49 35.40 35.45 35.28 35.37 35.43 34.82 34.95 34.90 34.93 34.94 T2 3.80 3.04 2.63 2.70 2.88 2.77 2.74 2.44 2.56 2.73 2.61 2.59 2.48 2.53 2.58 2.29 2.24 2.24 2.24 2.24 T3 0.85 0.95 0.88 0.84 0.59 0.74 0.66 0.62 0.53 0.59 0.54 0.53 0.47 0.48 0.47 0.47

α=.7

T4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 T1 18.57 18.04 16.35 17.06 17.50 14.78 15.01 14.72 14.86 14.96 14.24 14.46 14.42 14.44 14.46 14.06 14.24 14.23 14.23 14.24 T2 2.08 2.24 1.51 1.67 1.90 1.05 1.23 0.93 1.04 1.19 1.10 1.12 1.10 1.11 1.12 0.97 0.96 0.96 0.96 0.96 T3 0.61 0.91 0.73 0.70 0.42 0.56 0.50 0.46 0.37 0.42 0.39 0.38 0.31 0.33 0.31 0.31

α=.9

T4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

Page 38: Bagging Binary Predictors for Time Series · As noticed by many researchers bagging may outperform the unbagged predictors, but this may not always be the case. Bagging is a device

Table 10. AR(1)-ARCH(0)-MW8(Skewed Bimodal): 11 6.0 ++ += ttt uyy

R=20 R =50 R =100 R =200 J=1 J =50 J =1 J =50 J =1 J =50 J =1 J =50

Binary EW W1 W5 WR EW W1 W5 WR EW W1 W5 WR EW W1 W5 WR T1 6.90 5.79 5.57 5.63 5.71 5.19 4.96 4.97 4.96 4.97 4.93 4.91 4.91 4.91 4.92 5.17 5.06 5.06 5.06 5.06 T2 1.55 1.58 1.27 1.39 1.50 0.98 0.75 0.76 0.75 0.75 1.00 0.87 0.86 0.87 0.87 1.06 0.79 0.79 0.79 0.79 T3 0.81 0.86 0.85 0.84 0.39 0.39 0.39 0.39 0.24 0.23 0.24 0.23 0.25 0.24 0.25 0.25

α=.1

T4 0.01 0.03 0.01 0.01 0.16 0.16 0.16 0.15 0.20 0.21 0.20 0.21 0.31 0.31 0.31 0.31 T1 15.55 14.17 13.86 14.12 14.23 13.34 13.07 13.07 13.06 13.11 13.02 13.06 13.01 13.05 13.09 13.26 13.28 13.29 13.28 13.30 T2 2.36 2.41 2.27 2.32 2.44 2.19 2.04 2.01 1.99 2.03 2.29 2.34 2.27 2.31 2.35 2.06 1.96 1.95 1.98 1.97 T3 0.69 0.79 0.75 0.71 0.55 0.56 0.57 0.54 0.47 0.50 0.49 0.47 0.42 0.40 0.41 0.41

α=.3

T4 0.00 0.00 0.01 0.00 0.05 0.08 0.06 0.03 0.08 0.09 0.08 0.08 0.02 0.03 0.02 0.02 T1 17.82 16.73 16.24 16.57 16.79 16.50 16.30 16.14 16.21 16.27 16.42 16.35 16.24 16.19 16.29 15.87 15.84 15.80 15.83 15.81 T2 2.80 2.97 2.79 2.85 2.91 2.56 2.52 2.56 2.48 2.51 2.50 2.49 2.47 2.43 2.50 2.66 2.58 2.57 2.57 2.60 T3 0.68 0.75 0.72 0.66 0.45 0.56 0.50 0.47 0.44 0.48 0.48 0.43 0.36 0.39 0.38 0.37

α=.5

T4 0.04 0.07 0.09 0.07 0.16 0.18 0.18 0.18 0.20 0.24 0.20 0.23 0.31 0.28 0.26 0.30 T1 13.50 12.79 12.53 12.69 12.79 12.80 12.39 12.39 12.37 12.42 12.98 12.89 12.87 12.89 12.87 11.99 11.99 11.98 11.97 11.98 T2 2.49 2.43 2.35 2.33 2.46 2.09 1.90 1.94 1.92 1.96 2.19 2.11 2.13 2.13 2.08 2.07 1.99 2.00 1.98 1.99 T3 0.68 0.74 0.66 0.65 0.62 0.64 0.66 0.63 0.51 0.52 0.49 0.50 0.43 0.43 0.43 0.43

α=.7

T4 0.02 0.02 0.04 0.03 0.02 0.04 0.02 0.02 0.04 0.06 0.04 0.04 0.06 0.06 0.06 0.06 T1 5.56 4.89 4.71 4.85 4.93 4.69 4.61 4.58 4.59 4.59 4.84 4.91 4.88 4.89 4.88 4.56 4.63 4.63 4.62 4.62 T2 1.40 1.30 1.12 1.26 1.36 0.96 0.76 0.76 0.76 0.76 0.92 0.83 0.82 0.82 0.82 0.80 0.71 0.71 0.71 0.71 T3 0.70 0.74 0.67 0.67 0.33 0.35 0.34 0.35 0.24 0.24 0.24 0.24 0.27 0.26 0.27 0.27

α=.9

T4 0.02 0.03 0.03 0.02 0.19 0.20 0.20 0.19 0.08 0.08 0.08 0.08 0.07 0.07 0.07 0.07 Quantile

T1 26.33 21.96 20.11 20.78 21.27 20.14 19.02 18.76 18.90 19.00 18.95 18.60 18.50 18.54 18.59 18.63 18.55 18.53 18.53 18.55 T2 3.70 3.88 2.57 2.85 3.27 2.56 2.38 2.06 2.22 2.36 1.75 1.64 1.56 1.59 1.64 1.53 1.46 1.45 1.45 1.46 T3 0.90 1.00 0.96 0.92 0.82 0.90 0.87 0.83 0.68 0.73 0.71 0.68 0.56 0.58 0.58 0.56

α=.1

T4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 T1 44.52 40.20 38.70 39.28 39.78 38.81 37.83 37.48 37.64 37.80 37.69 37.40 37.24 37.34 37.39 37.15 37.17 37.10 37.14 37.16 T2 4.26 4.30 3.58 3.66 3.96 3.63 3.56 3.41 3.47 3.55 2.72 2.72 2.62 2.68 2.71 2.58 2.64 2.60 2.63 2.64 T3 0.89 0.99 0.97 0.91 0.80 0.95 0.90 0.82 0.70 0.76 0.74 0.71 0.54 0.56 0.56 0.55

α=.3

T4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 T1 49.39 44.93 43.65 44.15 44.66 44.52 43.23 42.90 43.07 43.21 43.18 42.66 42.50 42.59 42.66 42.49 42.44 42.36 42.41 42.44 T2 4.65 3.80 3.61 3.62 3.75 3.71 3.71 3.59 3.65 3.70 2.91 2.98 2.87 2.92 2.97 2.99 2.97 2.93 2.96 2.97 T3 0.93 0.99 0.97 0.95 0.83 0.95 0.90 0.85 0.75 0.80 0.78 0.75 0.58 0.62 0.60 0.58

α=.5

T4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 T1 41.08 37.18 35.79 36.27 36.82 36.46 35.47 35.15 35.31 35.44 35.28 34.98 34.82 34.91 34.97 34.52 34.41 34.36 34.39 34.41 T2 4.40 4.01 3.51 3.46 3.78 2.68 2.72 2.59 2.65 2.70 2.56 2.41 2.34 2.38 2.41 2.56 2.56 2.50 2.53 2.56 T3 0.90 0.98 0.96 0.92 0.80 0.92 0.84 0.81 0.60 0.76 0.68 0.61 0.62 0.62 0.61 0.62

α=.7

T4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 T1 21.14 19.14 17.05 17.75 18.38 16.17 15.77 15.59 15.67 15.74 15.34 15.42 15.35 15.39 15.41 14.85 15.02 15.01 15.01 15.02 T2 3.31 3.72 2.27 2.53 3.08 1.56 1.42 1.34 1.37 1.41 1.28 1.34 1.31 1.32 1.34 1.20 1.20 1.19 1.20 1.20 T3 0.68 0.93 0.87 0.82 0.65 0.76 0.70 0.68 0.51 0.53 0.51 0.51 0.31 0.31 0.31 0.31

α=.9

T4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

Page 39: Bagging Binary Predictors for Time Series · As noticed by many researchers bagging may outperform the unbagged predictors, but this may not always be the case. Bagging is a device

Figure 1(a): SP500 Binary Prediction

0 50 1 0 0 1 5 05

6

7

8

9

10α = 0 . 1

R

Co

st

0 5 0 1 0 0 15010

11

12

13

14

15

16α = 0 . 2

R

Co

st

0 50 1 0 0 1 5 01 2

1 4

1 6

1 8

2 0

2 2

2 4α = 0 . 3

R

Co

st

0 50 1 0 0 1 5 016

18

20

22

24

26

28α = 0 . 4

R

Cos

t

0 5 0 1 0 0 15010

12

14

16

18

20

22

24α = 0 . 5

R

Cos

t

0 50 1 0 0 1 5 01 3

1 4

1 5

1 6

1 7

1 8

1 9

2 0α = 0 . 6

R

Cos

t

0 50 1 0 0 1 5 011

12

13

14

15

16

17α = 0 . 7

R

Cos

t

0 5 0 1 0 0 1506

8

10

12

14

16α = 0 . 8

R

Cos

t

0 50 1 0 0 1 5 02

4

6

8

1 0

1 2

1 4α = 0 . 9

R

Cos

t

U n b a g g e dE W ( k = R )B M A ( k = 1 )B M A ( k = 5 )B M A ( k = R )

Figure 1(b): SP500 Quantile Prediction

0 50 100 15080

100

120

140

160

180α = 0 . 1

R

Cos

t

0 5 0 100 1501 2 0

1 4 0

1 6 0

1 8 0

2 0 0

2 2 0α = 0 . 2

R

Cos

t

0 50 1 0 0 1 5 01 4 0

1 6 0

1 8 0

2 0 0

2 2 0

2 4 0

2 6 0α =0 .3

R

Cos

t

0 50 100 150180

200

220

240

260

280α = 0 . 4

R

Co

st

0 5 0 100 1501 6 0

1 8 0

2 0 0

2 2 0

2 4 0

2 6 0

2 8 0

3 0 0α = 0 . 5

R

Co

st

0 50 1 0 0 1 5 01 6 0

1 8 0

2 0 0

2 2 0

2 4 0

2 6 0α =0 .6

R

Co

st

0 50 100 150140

160

180

200

220

240α = 0 . 7

R

Cos

t

0 5 0 100 1501 0 0

1 2 0

1 4 0

1 6 0

1 8 0

2 0 0

2 2 0α = 0 . 8

R

Cos

t

0 50 1 0 0 1 5 05 0

1 0 0

1 5 0

2 0 0

2 5 0α =0 .9

R

Cos

t

U n b a g g e dE W ( k = R )B M A ( k = 1 )B M A ( k = 5 )B M A ( k = R )

Page 40: Bagging Binary Predictors for Time Series · As noticed by many researchers bagging may outperform the unbagged predictors, but this may not always be the case. Bagging is a device

Figure 2(a): NASDAQ Binary Prediction

0 50 1 0 0 1 5 04

6

8

10

12α = 0 . 1

R

Co

st

0 5 0 1 0 0 15010

11

12

13

14

15

16α = 0 . 2

R

Co

st

0 50 1 0 0 1 5 01 0

1 2

1 4

1 6

1 8

2 0

2 2

2 4α = 0 . 3

R

Co

st

0 50 1 0 0 1 5 016

18

20

22

24

26α = 0 . 4

R

Cos

t

0 5 0 1 0 0 15018

19

20

21

22

23

α = 0 . 5

R

Cos

t

0 50 1 0 0 1 5 01 4

1 5

1 6

1 7

1 8

1 9

2 0

2 1α = 0 . 6

R

Cos

t

0 50 1 0 0 1 5 012

13

14

15

16α = 0 . 7

R

Cos

t

0 5 0 1 0 0 1508

9

10

11

12α = 0 . 8

R

Cos

t

0 50 1 0 0 1 5 04

5

6

7

8

9

1 0α = 0 . 9

R

Cos

t

U n b a g g e dE W ( k = R )B M A ( k = 1 )B M A ( k = 5 )B M A ( k = R )

Figure 2(b): NASDAQ Quantile Prediction

0 50 1 0 0 1 5 0100

150

200

250

300

350α =0 .1

R

Cos

t

0 5 0 100 1502 0 0

2 5 0

3 0 0

3 5 0

4 0 0

4 5 0α = 0 . 2

R

Cos

t

0 5 0 100 150250

300

350

400

450

500

550α = 0 . 3

R

Cos

t

0 50 1 0 0 1 5 0300

350

400

450

500

550α =0 .4

R

Cos

t

0 5 0 100 1503 0 0

3 5 0

4 0 0

4 5 0

5 0 0

5 5 0α = 0 . 5

R

Cos

t

0 5 0 100 150300

350

400

450

500α = 0 . 6

R

Cos

t

0 50 1 0 0 1 5 0250

300

350

400

450

500α =0 .7

R

Cos

t

0 5 0 100 1502 2 0

2 4 0

2 6 0

2 8 0

3 0 0

3 2 0

3 4 0

3 6 0α = 0 . 8

R

Cos

t

0 5 0 100 150100

150

200

250

300

350α = 0 . 9

R

Cos

t

UnbaggedE W ( k = R )B M A ( k = 1 )B M A ( k = 5 )B M A ( k = R )

Page 41: Bagging Binary Predictors for Time Series · As noticed by many researchers bagging may outperform the unbagged predictors, but this may not always be the case. Bagging is a device

0

200

400

600

800

1000

1200

-2.5 0.0 2.5

Series : MW1Sam ple 1 10000Observations 10000

Mean -0.012807Median -0.011404Maxim um 4.334614Minim um -4.252738Std. Dev. 0.998405Skewness -0.008630Kurtos is 3.078202

Jarque-Bera 2.672266Probability 0.262860

0

200

400

600

800

1000

1200

-3.75 -2.50 -1.25 0.00 1.25 2.50 3.75

Series : MW2Sam ple 1 10000Observations 10000

Mean 0.927466Median 1.038380Maxim um 3.796584Minim um -4.084973Std. Dev. 1.007481Skewness -0.722575Kurtos is 3.989728

Jarque-Bera 1278.342Probability 0.000000

0

400

800

1200

1600

2000

2400

-2.50 -1.25 0.00 1.25 2.50

Series : MW3Sam ple 1 10000Observations 10000

Mean -1.925271Median -2.315131Maxim um 3.113319Minim um -3.306660Std. Dev. 1.026836Skewness 1.471052Kurtos is 4.812800

Jarque-Bera 4975.924Probability 0.000000

0

400

800

1200

1600

2000

2400

-2.5 0.0 2.5

Series : MW4Sam ple 1 10000Observations 10000

Mean 0.019129Median 0.005695Maxim um 4.212732Minim um -4.619784Std. Dev. 0.997794Skewness 0.037861Kurtos is 4.514265

Jarque-Bera 957.8048Probability 0.000000

0

1000

2000

3000

4000

5000

*** -7.5 -5.0 -2.5 0.0 2.5 5.0 7.5

Series : MW5Sam ple 1 10000Observations 10000

Mean 0.002387Median -0.005512Maxim um 9.471090Minim um -9.521809Std. Dev. 0.981310Skewness 0.361266Kurtos is 24.77680

Jarque-Bera 197813.0Probability 0.000000

0

100

200

300

400

500

600

700

800

900

-2 -1 0 1 2 3

Series : MW6Sam ple 1 10000Observations 10000

Mean -0.001610Median 0.003977Maxim um 3.125011Minim um -2.737267Std. Dev. 1.009703Skewness 0.003909Kurtos is 2.054055

Jarque-Bera 372.8636Probability 0.000000

0

400

800

1200

1600

-2 -1 0 1 2

Series : MW7Sam ple 1 10000Observations 10000

Mean 0.016674Median 0.274751Maxim um 2.166670Minim um -2.015000Std. Dev. 1.002191Skewness -0.034592Kurtos is 1.377536

Jarque-Bera 1098.824Probability 0.000000

0

200

400

600

800

1000

1200

-2.50 -1.25 0.00 1.25 2.50

Series : MW8Sam ple 1 10000Observations 10000

Mean 0.325811Median 0.370494Maxim um 3.327683Minim um -3.435548Std. Dev. 1.001731Skewness -0.329832Kurtos is 2.405267

Jarque-Bera 328.6929Probability 0.000000

Note: This page is an appendix for the referees and is not intended for publication. The eight Marron-Wand (MW) mixture normal densities, used in Section 5, are presented in histograms using pseudo-randomly generated 10,000 series, together with some summary statistics.

Page 42: Bagging Binary Predictors for Time Series · As noticed by many researchers bagging may outperform the unbagged predictors, but this may not always be the case. Bagging is a device

0

5

10

15

20

25

30

35

-0.2 -0.1 0.0 0.1

Series: SPRETURNSample 1982:11 2004:02Observations 256

Mean 0.008413Median 0.011627Maximum 0.123780Minimum -0.245428Std. Dev. 0.044724Skewness -1.066149Kurtosis 6.995984

Jarque-Bera 218.8222Probability 0.000000

0

10

20

30

40

50

-0.3 -0.2 -0.1 0.0 0.1 0.2

Series: NASDAQRETURNSample 1984:11 2004:02Observations 232

Mean 0.009171Median 0.018550Maximum 0.198653Minimum -0.317919Std. Dev. 0.072081Skewness -1.016935Kurtosis 5.920758

Jarque-Bera 122.4521Probability 0.000000

Note: This page is an appendix for the referees and is not intended for publication. The histograms of the S&P500 and NASDAQ monthly stock index returns, used in Section 6, are presented, together with some summary statistics.