The Henderson Smoother in Reproducing Kernel Hilbert Spaceplaza.ufl.edu/yiz21cn/refer/HendersonFinalDraft.pdf · ing to which a kernel estimator of order p ≥ 2 can always be decomposed

The Henderson Smoother

in

Reproducing Kernel Hilbert Space

Estela Bee Dagum ∗ and Silvia BianconciniDepartment of Statistics, University of Bologna, Via Belle Arti 41, 40126 Bologna, Italy.

Abstract

The Henderson smoother (1916) has been traditionally applied fortrend-cycle estimation in the context of nonparametric seasonal ad-justment softwares such as Census X11, X11/X12 ARIMA, officiallyadopted by statistical agencies. Particularly, the 13-term filter hasshown to possess good properties to detect an upcoming turning pointbut it has the shortcoming of introducing large revisions for the mostrecent estimates when new observations are added to the series. Thislimitation is of serious consequences for short-term trend analysis. Inthis study we introduce a Henderson third order kernel representationby means of the reproducing kernel Hilbert space (RKHS) method-ology. Two density functions and corresponding orthonormal poly-nomials up to the second degree have been calculated. One is basedon the Henderson weighting function applied in the least square fit-ting minimization procedure. The other is the biweight density withthe associated Jacobi polynomials. Both are shown to give excellentrepresentations for short and medium length filters. The asymmet-ric weights are derived by adapting the third order kernel functionsto the length of the various filters. A comparison of the Hendersonthird order kernel asymmetric filters is made with the classical onesdeveloped by Musgrave (1964a and 1964b). The former are shown tobe superior in terms of signal passing, noise suppression and speed of

∗Corresponding author. E-mail address: [email protected].

1

convergence to the symmetric filter.

Keywords. Symmetric and asymmetric weighting systems, biweightdensity function, higher order kernels, local weighted least squares,spectral properties.

1 Introduction

The linear smoother developed by Henderson (1916) is the most widely ap-plied to estimate the trend-cycle latent component in nonparametric seasonaladjustment software such as the U.S. Bureau of the Census X11 method(Shiskin et al. 1967) and its variants the X11ARIMA (Dagum 1980, 1988)and X12ARIMA (Findley et al, 1998). This smoother is based on threeequivalent smoothing criteria: (1) the minimization of the sum of squares ofthe third differences of the filter weights, (2) minimization of the variance ofthe third differences of the input series, and (3) fitting a cubic polynomial byweighted least squares where the weighting function is chosen to satisfy thecriterion (1) above. In other words, fitted to exact cubic functions, the Hen-derson smoother will reproduce their values, and fitted to stochastic cubicpolynomials it will give smoother results than those obtained by OLS. Onthe contrary, the asymmetric filters associated with the Henderson smootherwere developed by Musgrave (1964a, and 1964b) on the basis of minimizingthe mean squared revision between final and preliminary estimates. Althoughthe basic assumption is that of fitting a linear trend within the span of thefilter, the asymmetric weights can only reproduce a constant for the onlyimposed constraint is that the weights add to one (see e.g. Laniel, 1985,and Doherty, 1992). Following the already established tradition we shall callboth symmetric and asymmetric filters as Henderson filters.

The study of the properties and limitations of the Henderson smootherhas been done in different contexts and attracted the attention of a largenumber of authors, among them, Cholette (1981 and 1982), Kenny andDurbin (1982), Castles (1987), Dagum and Laniel (1987), Cleveland et al.(1990), Dagum (1996), Gray and Thomson (1996a and 1996b), Dagum andCapitanio (1998), Loader (1999), Dalton and Keogh (2000), Ladiray andQuenneville (2001), Quenneville, Ladiray and Lefrancois (2003), Dagum andLuati (2004). However, none of these studies have approached the Hendersonsmoother from a reproducing kernel Hilbert space (RKHS) perspective.

2

A RKHS is a Hilbert space characterized by a kernel that reproduces, viaan inner product, every function of the space or, equivalently a Hilbert spaceof real valued functions with the property that every point evaluation func-tional is a bounded linear functional. Parzen (1959) was the first to introducea RKHS approach to time series applying the famous Loeve (1948) theoremby which there is an isometric isomorphism between the closed linear spanof a second order stationary stochastic process and the RKHS determinedby its covariance function. Parzen demonstrated that the RKHS approachprovides a unified framework to three fundamental problems concerning: (1)least square estimation, (2) minimum variance unbiased estimation of re-gression parameters, and (3) identification of unknown signals perturbed bynoise. Parzen ’s approach is parametric, and basically consists of estimatingthe unknown signal by generalized least squares in terms of the inner prod-uct between the observations and the covariance function. A nonparametricapproach of the RKHS methodology was developed by DeBoor and Lynch(1966) in the context of cubic spline approximation. Later, Kimeldorf andWahba (1970a and 1970b) exploited both developments and treated the gen-eral spline smoothing problem from a RKHS stochastic equivalence perspec-tive. These authors proved that minimum norm interpolation and smoothingproblems with quadratic constraints imply an equivalent Gaussian stochasticprocess.

The RKHS approach followed in our study is strictly nonparametric. Wemake use of the fundamental theoretical result due to Berlinet (1993) accord-ing to which a kernel estimator of order p ≥ 2 can always be decomposed intothe product of a reproducing kernel Rp−1(·, ·) and a probability density func-tion f0 with finite moments up to order 2p. The reproducing kernel belongsto the space of polynomials of degree at most p − 1. Hence, the weightedleast squares estimation of the nonstationary mean uses weights derived fromthe density function f0 from which the reproducing kernel is defined and notthe covariance function.

The main purpose of this study is to introduce a RKHS representationof the Henderson smoother. This Henderson kernel representation enablesthe construction of a hierarchy of kernels with varying smoothing properties.Furthermore, for each kernel order, the asymmetric filters can be derivedcoherently with the corresponding symmetric weights or from a lower orhigher order kernel within the hierarchy, if preferred. In the particular caseof the currently applied asymmetric Henderson filters, those obtained bymeans of the RKHS, coherent to the symmetric smoothers, are shown to

3

have superior properties from the view point of signal passing and noisesuppression. Henderson kernel filters of any length can be constructed andanalyzed, including infinite ones. Furthermore, although not discussed inthe paper, the Henderson smoother represented as a second or third orderkernel calculated for large spans can be used for trend instead of trend-cyclesmoothing as the classical Henderson.

Section 2 briefly deals with the basic properties of Hilbert spaces andreproducing kernels. Section 3 discusses the classical Henderson symmetricsmoother and two density functions are derived to generate the correspondingthird order kernel representations. It illustrates the approximations for spansof 9, 13 and 23 terms. Section 4 presents the asymmetric Henderson kernelfilters and make a comparison with the currently being used by means ofspectral analysis.

2 Linear Filters in Reproducing Kernel Hilbert

Spaces

A basic assumption in time series analysis is that the input series{yt, t = 1, 2, ..., N} can be decomposed into the sum of a systematic compo-nent called the signal (or nonstationary mean) gt, plus an erratic componentcalled the noise ut, such that

yt = gt + ut. (1)

The noise component ut is assumed to be either a white noise, WN(0, σ2u), or,

more generally, to follow a stationary and invertible Autoregressive MovingAverage (ARMA) process.Assuming that the input series {yt, t = 1, 2, ..., N} is seasonally adjusted orwithout seasonality, the signal gt represents the trend and cyclical compo-nents, usually referred to as trend-cycle for they are estimated jointly. Thetrend-cycle can be deterministic or stochastic, and have a global or a localrepresentation. The trend-cycle can be represented locally by a polynomialof degree p of the time distance j, between yt and the neighboring observa-tions yt−j. Hence, given ut for some time point t, it is possible to find a localpolynomial trend estimator

gt(j) = a0 + a1j + ... + apjp + εt(j), (2)

4

where a0, a1, ..., ap ∈ R and εt is assumed to be purely random and mutuallyuncorrelated with ut.The coefficients a0, a1, ..., ap can be estimated by ordinary or weighted leastsquares or by summation formulae (see e.g. Dagum, 1985). The solution fora0 provides the trend-cycle estimate gt(0), which equivalently consists in aweighted moving average (Kendall, Stuart, and Ord, 1983), such that

gt(0) = gt =m

∑

j=−m

wjyt−j (3)

where wj, j < N denotes the weights to be applied to the observations yt−j

to get the estimate gt for each point in time t = 1, 2, ..., N .The weights depend on: (1) the degree of the fitted polynomial, (2) theamplitude of the neighborhood, and (3) the shape of the function used toaverage the observations in each neighborhood. Once a (symmetric) span2m + 1 of the neighborhood has been selected, the wj’s for the observationscorresponding to points falling out of the neighborhood of any target pointare null or approximately null, such that the estimates of the N −2m centralobservations are obtained by applying 2m + 1 symmetric weights to the ob-servations neighboring the target point. The missing estimates for the firstand last m observations can be obtained by applying asymmetric movingaverages of variable length to the first and last m observations respectively.The length of the moving average or time invariant symmetric linear filter is2m + 1, whereas the asymmetric linear filters length is time varying.Using the backshift operator B, such that Bnyt = yt−n, equation (3) can bewritten as

gt =∑

j

wjBjyt = W (B)yt, t = 1, 2, ..., N (4)

where W (B) is a linear nonparametric estimator.The nonparametric estimator W (B) is said to be a second order kernel if

it satisfies the conditions

m∑

j=−m

wj = 1, (5)

m∑

j=−m

jwj = 0, (6)

5

hence it preserves a constant and a linear trend. On the other hand, W (B)is a higher order kernel if

m∑

j=−m

wj = 1, (7)

andm

∑

j=−m

jiwj = 0, (8)

for some i = 1, 2, ..., p ≥ 2. In other words, it will reproduce a polynomialtrend of degree p − 1 without distortion.

A different characterization of a p-th order nonparametric estimator canbe provided by means of the RKHS methodology.A Hilbert space is a complete linear space with a norm given by an innerproduct. The finite N -dimensional space R

N and the space of square inte-grable functions L2 are those used in this study.

Let us assume that the time series {yt, t = 1, 2, ..., N} is a finite realizationof a family of square integrable random variables, i.e.

∫

R| Yt |

2< ∞. Hence,the random process {Yt}t∈T belongs to the space L2(R). The space L2(R) isa Hilbert space endowed with the inner product defined by

< U(t), V (t) >= E(U(t)V (t)) =

∫

R

U(t)V (t)f0(t)dt (9)

where U, V ∈ L2(R), and f0 is a probability density function, weighting eachobservation to take into account its position in time. In the following, L2(R)will be indicated as L2(f0).The local polynomial trend gt(·) belongs to the space of polynomials of degreeat most p, Pp, with p non-negative integer. Pp is a Hilbert subspace of L2(f0),thus it inherits its inner product, that is,

< P (t), Q(t) >=

∫

R

P (t)Q(t)f0(t)dt, (10)

where P (t), Q(t) ∈ Pp.Berlinet (1993) showed that the space Pp is a reproducing kernel Hilbert

space of polynomials on some domain T , that is, ∀t ∈ T and ∀P ∈ Pp, thereexists an element Rp(t, .) ∈ Pp, such that

P (t) =< P (.), Rp(t, .) > . (11)

6

Rp(t, .) is called reproducing kernel, since

< R(t, .), R(., s) >= R(s, t). (12)

Formally, R is a function

R : T × T → R

(s, t) 7→ R(s, t)

satisfying the following properties:

1. R(t, .) ∈ H, ∀t ∈ T ;

2. < g(.), R(t, .) >= g(t), ∀t ∈ T and g ∈ H.

This last condition is called the ”reproducing property”: the value of thefunction g at the point t is reproduced by the inner product of g with R(t, .).

Suppose that {yt, t = 1, 2, ..., N} ∈ L2(f0) can be decomposed as in eq.(1), the estimate gt of eq. (2) can be obtained by minimizing the distancebetween yt+j and gt(j), that is,

‖yt+j − gt(j)‖2 =

∫ m

−m

(yt+j − gt(j))2f0(j)dj, (13)

where the positive real number m determines the neighborhood of t on whichthe deviation between yt+j and gt(j) is taken into account in the L2-sense.For this reason, 2m + 1 is called the bandwidth. The weighting function f0

depends on the distance between the target point t and each observation inthe 2m + 1 points neighborhood (for m + 1 ≤ t ≤ N − m). Hence, gt(0) isgiven by

gt(0) =

∫ m

−m

yt+jRp(j, 0)f0(j)dj (14)

=

∫ m

−m

ΠPp[yt+j]Rp(j, 0)f0(j)dj (15)

where ΠPp[yt+j] denotes the projection of the observations yt+j, j = −m, ...,m

on Pp, and Rp is the reproducing kernel of the space Pp.

7

In fact, each element yt+j of the Hilbert space L2(f0) can be decomposed intothe sum of its projection in a Hilbert subspace of L2(f0), such as the spacePp, plus its orthogonal complement as follows

yt+j = ΠPp[yt+j] + {yt+j − ΠPp

[yt+j]} (16)

and, by orthogonality, for every j ∈ T

< yt+j, Rp(j, 0) >=< ΠPp[yt+j], Rp(j, 0) >= ΠPp

[yt] = gt. (17)

Hence, gt can be equivalently seen as the projection of yt on Pp and as a localweighted average of the observations for the discrete version of the filter givenin eq. (4), where the weights wj are given by a kernel function K of orderp + 1,

Kp+1(t) = Rp(t, 0)f0(t), (18)

where p is the degree of the fitted polynomial.The following result due to Berlinet (1993) is fundamental.Kernels of order (p+1), p ≥ 1 , can be written as products of the reproducingkernel Rp(t, .) of the space Pp ⊆ L2(f0) and a density function f0 with finitemoments up to order 2p. That is,

Kp+1(t) = Rp(t, 0)f0(t). (19)

For any sequence (Pi)0≤i≤p of (p + 1) orthonormal polynomials in L2(f0),

Rp(t, 0) =

p∑

i=0

Pi(t)Pi(0). (20)

A set {Pn(t), t ∈ [a, b]}n∈N of polynomials is orthogonal with respect to aweighting function f0 if

b∑

t=a

Pn(t)Pm(t)f0(t) = 0, ∀n 6= m

b∑

t=a

P 2n(t)f0(t) 6= 0, ∀n ∈ N

and it is orthonormal if

b∑

t=a

Pn(t)2f0(t) = 1, ∀n ∈ N. (21)

8

Therefore, eq. (18) becomes,

Kp+1(t) =

p∑

i=0

Pi(t)Pi(0)f0(t). (22)

An important outcome of the RKHS theory is that linear filters can begrouped into hierarchies {Kp, p = 2, 3, 4, ...} with the following property:each hierarchy is identified by a density f0 and contains kernels of order 2,3, 4,... which are products of orthonormal polynomials by f0.The weight system of a hierarchy is completely determined by specifying:(a) the bandwidth or smoothing parameter, (b) the maximum order of theestimator in the family, and (c) the density f0.In this study, the smoothing parameter is not derived by data dependentoptimization criteria, but we fixed it with the aim to obtain a kernel repre-sentation of the most often applied Henderson smoothers. However, kernelsof any length, including infinite ones, can be obtained with the above ap-proach. Consequently, the results discussed can be easily extended to anyfilter length as long as the density function and its orthonormal polynomialsare specified.The identification and specification of the density is one of the most cru-cial task for smoothers based on local polynomial fitting by weighted leastsquares, as Loess and the Henderson smoother. The density is related to theweighting penalizing function of the minimization problem.

3 The Symmetric Henderson Smoother and

Its Kernel Representation

Recognition of the fact that the smoothness of the estimated trend-cyclecurve depends directly on the smoothness of the weight diagram led Hender-son (1916) to develop a formula which makes the sum of squares of the thirddifferences of the smoothed series a minimum for any number of terms.Henderson’s starting point was the requirement that the filter should repro-duce a cubic polynomial trend without distortion. Henderson showed thatthree alternative smoothing criteria give the same formula, as shown explic-itly by Kenny and Durbin (1982) and Gray and Thomson (1996a):

1. minimization of the variance of the third differences of the series de-fined by the application of the moving average;

9

2. minimization of the sum of squares of the third differences of the coef-ficients of the moving average formula;

3. fitting a cubic polynomial by weighted least squares, where the weightsare chosen as to minimize the sum of squares of their third differences.

The problem is one of fitting a cubic trend by weighted least squares tothe observations yt+j, j = −m, ...,m, the value of the fitted function at j = 0being taken as the smoothed observation gt. Representing the weight assignedto the residuals from the local polynomial regression by Wj, j = −m, ...,m ,where Wj = W−j, the problem is the minimization of

m∑

j=−m

Wj[yt+j − a0 − a1j − a2j2 − a3j

3]2, (23)

where the solution for the constant term a0 is the smoothed observation gt.Henderson (1916) showed that gt is given by

gt =m

∑

j=−m

φ(j)Wjyt+j, (24)

where φ(j) is a cubic polynomial whose coefficients have the property that thesmoother reproduces the data if they follow a cubic. Henderson also provedthe converse: if the coefficients of a cubic-reproducing summation formula{wj} do not change their sign more than three times within the filter span,then the formula can be represented as a local cubic smoother with weightsWj > 0 and a cubic polynomial φ(j) such that φ(j)Wj = wj. To obtain {Wj}from {wj} one simply divide by a cubic polynomial whose roots match thoseof {wj}.Henderson (1916) measured the amount of smoothing of the input series bythe

∑

(∆3yt)2 or equivalently by the sum of squares of the third differences

of the weight diagram∑

(∆3wj)2. The solution is that resulting from the

minimization of a cubic polynomial function by weighted least squares

Wj ∝ {(m + 1)2 − j2}{(m + 2)2 − j2}{(m + 3)2 − j2}. (25)

where Wj is the weighting penalty function of criterion (3) above. Followingthe Henderson’s theorem stated before, the weight diagram {wj} correspond-ing to (25), known as Henderson’s ideal formula, is obtained, for a filter length

10

equal to 2m − 3, by

wj =315[(m − 1)2 − j2](m2 − j2)[(m + 1)2 − j2](3m2 − 16 − 11j2)

8m(m2 − 1)(4m2 − 1)(4m2 − 9)(4m2 − 25)(26)

This optimality result has been rediscovered several times in modern lit-erature, usually for asymptotic variants. Loader (1999) showed that theHenderson’s ideal formula (26) is a finite sample variant of a density ker-nel with second order vanishing moments which minimizes the third deriva-tive of the function given by Muller (1984). In particular, Loader showedthat for large m, the weights of Henderson’s ideal penalty function Wj

are approximately m6W (j/m), where W (j/m) is the triweight function.He concluded that, for very large m, the weight diagram is approximately(315/512) ∗W (j/m)(3− 11(j/m)2) equivalent to the kernel given by Muller(1984).

To derive the Henderson kernel hierarchy by means of the RKHS method-ology, the density corresponding to Wj and its orthonormal polynomial haveto be determined. The triweight density function gives very poor resultswhen the Henderson smoother spans are of short or medium lengths, as inmost application cases, ranging from 5 to 23 terms.The density f0H(t) corresponding to the weighting penalty function Wj inthe weighted least squares fitting is obtained as follows. The function Wj is

• nonnegative in the intervals (−m − 1,m + 1), (−m − 3,−m − 2),(m + 2,m + 3) and negative otherwise;

• Wj = 0 if j = ±(m + 1),±(m + 2),±(m + 3);

• Wj, j ∈ [−m − 1,m + 1], is increasing in [−m − 1, 0), decreasing on(0,m + 1] and reaches its maximum at j = 0.

Choosing the support [−m − 1,m + 1], the integral k =∫ m+1

−m−1Wjdj is

different from 1 and represents the integration constant on this support. Itfollows that the density corresponding to Wj on the interval [−m−1,m+1]is given by

f0H(j) = Wj/k.

For m = 6, the filter is the classical 13-term Henderson

Wj ∝ {49 − j2}{64 − j2}{81 − j2},

11

k =

∫ 7

−7

Wjdj =27225968

15

and the density defined on [−7, 7] is given by

f0H(j) =1

k× {49 − j2}{64 − j2}{81 − j2}.

To eliminate the dependence of the support on the bandwidth parameter m,a new variable ranging on [−1, 1], t = j/(m+1), is considered. Applying thechange of variables method,

f0H(t) = f(t−1(j))∣

∣

∣

∂t−1(j)∂t

∣

∣

∣,

where t(j) = jm+1

, t−1(j) = (m + 1)t, it follows that ∂t−1(j)∂t

= (m + 1) and

f0H(t) =W ((m + 1)t)

k(m + 1) =

W (j)

k(m + 1). (27)

The density f0H(t) is

• symmetric, i.e. f0H(−t) = f0H(t);

• nonnegative on [−1, 1] and f0H(t) = 0 when t = −1 or t = 1;

• increasing on [−1, 0) and decreasing on (0, 1], with a maximum fort = 0.

For m = 6,

f0H(t) =15

79376(5184 − 12289t2 + 9506t4 − 2401t6), t ∈ [−1, 1]. (28)

To obtain higher order kernels, the corresponding orthonormal polynomialshave to be computed for the density (27). The polynomials can be derivedby the Gram-Schmidt orthonormalization procedure or by solving the Hankelsystem based on the moments of the density f0H . This latter is the approachfollowed in this study and implemented using a MATLAB routine (see Bian-concini, 2006).The hierarchy corresponding to the 13-term Henderson kernel has been ob-tained and for p = 3 it gives the classical Henderson filter.

12

Henderson Kernels Kernel Orders

15

79376(5184 − 12289t2 + 9506t4 − 2401t6) p = 2

15

79376(5184 − 12289t2 + 9506t4 − 2401t6) ∗ ( 2175

1274− 1372

265t2) p = 3

Since the triweight density function gives a poor approximation of theHenderson weights for small m (5 to 23 terms), we search for another densityfunction with well known theoretical properties. The main reason is that theexact density (27) is a function of the bandwidth and need to be calculatedany time that m changes together with its corresponding orhtonormal poly-nomials. We found the biweight to give almost equivalent results to thoseobtained with the exact density function without the need to be calculatedany time that the Henderson smoother length changes. Another importantadvantage is that the biweight density function belongs to the well-knownbeta distribution family, that is

f(t) =(

r2B(s+1, 1

r)

)

(1 − |t|r)sI[−1,1](t), (29)

where B(a, b) =∫ 1

0ta−1(1 − t)b−1dt with a, b > 0 is the beta function. The

orthonormal polynomials needed for the reproducing kernel associated to thebiweight function are the Jacobi polynomials, for which explicit expressionsfor computation are available and their properties have been widely studiedin literature.We obtain another Henderson kernel hierarchy using the biweight density

f0B(t) =15

16(1 − t2)2I[−1,1](t) (30)

combined with the Jacobi orthonormal polynomials. These latter are charac-terized by the following explicit expression (Abramowitz and Stegun, 1972):

Pα,βn (t) =

1

2n

n∑

m=0

(

n + αm

) (

n + βn − m

)

(t − 1)n−m(t + 1)m, (31)

where α = 2 and β = 2.

13

The Henderson second order kernel is given by the density function f0B,since the reproducing kernel R1(j, 0) of the space of polynomials of degree atmost one is always equal to one. The third order kernel is given by

15

16(1 − |t|2)2 × (

7

4−

21

4t2) (32)

Figures 1, 2 and 3 show both classical and Henderson kernels symmetricweights for span of 9, 13, and 23 terms.

Figure 1: Classical Henderson smoothers and third order Henderson kernelsof 9 terms

The small discrepancy of the two kernel functions relative to the classicalHenderson smoother are due to the fact that the exact density is obtained byinterpolation from a finite small number of points of the weighting penaltyfunction Wj. On the other hand, the biweight is already a density functionwhich is made discrete by choosing selected points to produce the weights.We also calculated the

∑

(∆3wj)2 for each filter span as shown in Table 1.

14



15

Filters 9-term 13-term 23-term

Classical Henderson smoother 0.053 0.006 0.000

Exact Henderson Kernel 0.048 0.006 0.000

Biweight Henderson Kernel 0.052 0.006 0.000

Table 1. Sum of squares of the third differences of the weights for the classical

Henderson smoother and kernel representations of 9, 13 and 23 terms.

The smoothing power of the filters are very close one another except for theexact Henderson kernel of 9-term which gives the smoothest curve.Given the equivalence for symmetric weights we use the RKHS methodologyto generate the correspondent asymmetric filters for the last six points.

4 Asymmetric Henderson Smoothers and Their

Kernel Representations

The asymmetric Henderson smoothers currently in use were developed byMusgrave (1964a and 1964b). They are based on the minimization of themean squared revision between the final estimates (obtained by the applica-tion of the symmetric filter) and the preliminary estimates (obtained by theapplication of an asymmetric filter) subject to the constraint that the sumof the weights is equal to one (see e.g. Laniel, 1985; Doherty, 1992). Theassumption made is that at the end of the series, the seasonally adjustedvalues follow a linear trend-cycle plus a purely random irregular εt such thatεt ∼ IID(0, σ2). The equation used is

E[r(i,m)t ]2 = c2

1(t −m

∑

j=−i

hij(t − j))2 + σ2

m∑

j=−m

(hmj − hij)2 (33)

where hmj and hij are the weights of the symmetric (central) filter and theasymmetric filters, respectively; hij = 0 for j = −m, ...,−i−1, c1 is the slopeof the line and σ2 denotes the noise variance. There is a relation between c1

and σ2 such that the noise to signal ratio, I/C is given by

I/C = (4σ2/π)1/2/|c1| orc21

σ2=

4

π(I/C)2. (34)

16

The I/C ratio is given by the absolute mean of the first difference of theirregulars over the absolute mean of the first differences of the trend-cycleestimates. The I/C ratio (34) determines the length of the Henderson trend-cycle filter to be applied. Thus, setting t = 0 and m = 6 for the end weightsof the 13-term Henderson, we have,

E[r(i,60 ]2

σ2=

4

π(I/C)2(

6∑

j=−i

hij)2 +

6∑

j=−6

(h6j − hij)2. (35)

Making I/C = 3.5 (the most noisy situation where the 13-term Hendersonis applied), eq. (35) gives the same set of end weights of Census X11 variant(Shiskin, Young and Musgrave, 1967). The end weights for the remainingmonthly Henderson filters are calculated using I/C = 0.99 for the 9-termfilter and I/C = 4.5 for the 23-term filter.

In the RKHS approach, the Henderson kernel asymmetric smoothers aregiven by

wj =K(j/b)

∑qi=−m K(i/b)

, j = −m, ..., q (36)

where j is the distance to the target point t, b is the bandwidth parameterselected for the symmetric filter, equal to m + 1, and m + q + 1 is the asym-metric filter length. For example, the asymmetric weights of the 13-termHenderson kernel for the last point are given by

wj =K(j/7)

∑0i=−6 K(i/7)

, j = −6, ..., 0. (37)

The symmetric and the asymmetric weights of the the classical Hendersonsmoother and the Henderson third order kernels are shown in Tables 2, 3 and4 respectively (in bold those corresponding to the target point).

-0,019 -0,028 0,000 0,066 0,147 0,214 0,240 0,214 0,147 0,066 0,000 -0,028 -0,019

0,000 -0,017 -0,025 0,001 0,066 0,147 0,213 0,238 0,212 0,144 0,061 -0,006 -0,034

0,000 0,000 -0,011 -0,022 0,003 0,067 0,145 0,210 0,235 0,205 0,136 0,050 -0,018

0,000 0,000 0,000 -0,009 -0,022 0,004 0,066 0,145 0,208 0,230 0,201 0,131 0,046

0,000 0,000 0,000 0,000 -0,016 -0,025 0,003 0,068 0,149 0,216 0,241 0,216 0,148

0,000 0,000 0,000 0,000 0,000 -0,043 -0,038 0,002 0,080 0,174 0,254 0,292 0,279

0,000 0,000 0,000 0,000 0,000 0,000 -0,092 -0,058 0,012 0,120 0,244 0,353 0,421

Table 2. Symmetric and asymmetric weights of 13-term classical Henderson smoother.

17

-0,019 -0,027 0,001 0,066 0,147 0,213 0,238 0,213 0,147 0,066 0,001 -0,027 -0,019

0,000 -0,019 -0,026 0,001 0,065 0,144 0,209 0,234 0,209 0,144 0,065 0,001 -0,026

0,000 0,000 -0,019 -0,026 0,001 0,063 0,140 0,204 0,228 0,204 0,140 0,063 0,001

0,000 0,000 0,000 -0,019 -0,026 0,001 0,063 0,141 0,204 0,228 0,204 0,141 0,063

0,000 0,000 0,000 0,000 -0,019 -0,027 0,001 0,067 0,150 0,218 0,244 0,218 0,150

0,000 0,000 0,000 0,000 0,000 -0,022 -0,032 0,001 0,079 0,176 0,256 0,286 0,256

0,000 0,000 0,000 0,000 0,000 0,000 -0,030 -0,043 0,002 0,106 0,237 0,344 0,385

Table 3. Symmetric and asymmetric weights of 13-term exact Henderson kernel.

-0.020 -0.030 0.002 0.070 0.149 0.211 0.234 0.211 0.149 0.070 0.002 -0.030 -0.020

0.000 -0.019 -0.029 0.002 0.069 0.146 0.207 0.230 0.207 0.146 0.069 0.002 -0.029

0.000 0.000 -0.019 -0.028 0.002 0.067 0.142 0.201 0.223 0.201 0.142 0.067 0.002

0.000 0.000 0.000 -0.019 -0.028 0.002 0.067 0.142 0.201 0.224 0.201 0.142 0.067

0.000 0.000 0.000 0.000 - 0.020 -0.031 0.002 0.072 0.153 0.216 0.240 0.216 0.153

0.000 0.000 0.000 0.000 0.000 -0.024 -0.036 0.003 0.085 0.180 0.255 0.283 0.255

0.000 0.000 0.000 0.000 0.000 0.000 -0.032 -0.048 0.004 0.114 0.242 0.342 0.380

Table 4. Symmetric and asymmetric weights of 13-term biweight Henderson kernel.

The convergence to the symmetric filter is faster for the Henderson kernelsmoothers, particularly for the ones corresponding to the last and previous tothe last points as confirmed by the gain function of both asymmetric filtersshown in Figures 4 and 5. Moreover, the gain function of the last pointHenderson kernel smoother does not amplify the signal as the classical H13.Since the weights of the biweight and exact Henderson kernels are equal upto the third digit, no visible differences are seen in the corresponding gainand phaseshift functions. Hence, we only show those corresponding to theexact Henderson kernels in Figures 4 and 6.There is an increase of the the phase shift for the low frequencies relative tothat of the classical H13 but both are less than a month, as exhibit in Figure6.

5 Conclusions

We introduced a new representation of the Henderson smoother by meansof the reproducing kernel Hilbert space (RKHS) methodology. The linearestimator is first transformed into a kernel of second order and from it ahierarchy is constructed that includes higher order kernels.

We made use of the theoretical result due to Berlinet (1993) accordingto which a p-th order kernel Kp, p ≥ 2, can be always decomposed into theproduct of the reproducing kernel of the space Pp−1 of polynomial of degreeat most p − 1, and a probability density function f0 with finite moments upto order 2p. We showed that the biweight density function is very close to

18

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.50

0.2

0.4

0.6

0.8

1

ω

G(ω)Last

Previous

Fourth

Third

Second

First

Symmetric

Figure 4: Gain functions of the asymmetric weights of the exact Hendersonkernels

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.50

0.2

0.4

0.6

0.8

1

ω

G(ω)

Last

Previous

Fourth

Third

Second

First

Symmetric

Figure 5: Gain functions of the asymmetric weights of the classical 13-termHenderson filter

19

0 0.1 0.2 0.3 0.4 0.5−2

−1.8

−1.6

−1.4

−1.2

−1

−0.8

−0.6

−0.4

−0.2

0

ω

φ(ω)

RKH

H13

Figure 6: Phase shifts of the asymmetric (end point) weights of the Hendersonkernel and of the classical H13 filter

the exact density calculated on the basis of the Henderson weighting penaltyfunction. The biweight has a computing advantage over the latter, in thesense that it does not need to be calculated any time that the Hendersonsmoother length changes as happens with the exact one. Another advantageis that the biweight density function belongs to the well-known beta distrib-ution family being the associated orthonormal polynomials the Jacobi ones,for which explicit expressions for computation are available and their prop-erties have been widely studied in literature. We used both densities andassociated orthonormal polynomials to generate Henderson kernels of ordertwo and three. The symmetric Henderson kernel weights for both generatingdensity functions are very close for short spans and we illustrated for thoseapplied to monthly data, that is 9, 13 and 23 terms.The asymmetric weights of the Henderson kernels have been derived byadapting the third order kernel functions to the length of the last six asym-metric filters. Compared to those of the classical 13-term Henderson theygave better results in terms of gain and revisions. It should be noted that bothHenderson kernels and classical asymmetric filters do not converge monoton-ically to the central one because they are higher order kernels. This is not

20

the case for second order kernels which are density functions.Kernel representation of any smoother length can be obtained by means ofthe RKHS approach. Moreover, the family of the nonsymmetric smootherscan be derived either coherent to the corresponding symmetric ones or theycan be derived from a lower or higher order kernel within the hierarchy, ifpreferred.Although not discussed here, the Henderson kernel of order two and threecalculated for very large filter span can be used to estimate only the trendinstead of the trend and cycle components.

References

[1] Abramowitz, M. and Stegun, I. (1972), Handbook of Mathematical Func-tions with Formulas, Graphs and Mathematical Tables, US GovermentPrinting Office, Washington D.C.

[2] Berlinet, A. (1993), Hierarchies of Higher Order Kernels, ProbabilityTheory and Related Fields, 94, pp. 489-504.

[3] Bianconcini, S. (2006), Trend-Cycle Estimation in Reproducing KernelHilbert Spaces, Ph.D. Thesis, Department of Statistics, University ofBologna.

[4] Castles, I. (1987), A Guide to Smoothing Time Series Estimates ofTrend, Catalogue No. 1316, Australian Bureau.

[5] Cholette, P.A. (1981), A Comparison of Various Trend-Cycle Estima-tors, in Anderson, O.D. and Perryman, M.R. (eds), ”Time Series Analy-sis”, North Holland, Amsterdam, pp. 77-87.

[6] Cholette, P.A. (1982), Comparaison de Deux Estimateurs des CyclesEconomiques, Research Paper No. 82-09-OO1F, Time Series Researchand Analysis Centre, Statistics Canada.

[7] Cleveland, R., Cleveland, W., McRae, J. and Terpenning, I. (1990),STL: A Seasonal Trend Decomposition Procedure Based on LOESS,Journal of Official Statistics, 6, 3-33.

[8] Dagum, E. B. (1980), The X11ARIMA Seasonal Adjustment Method,Statistics Canada Publication, Catalogue No. 12-564E, Ottawa.

21

[9] Dagum, E. B. (1985), Moving Averages, in Encyclopedia of StatisticalSciences, S. Kotz and N. Johnson, editors, John Wiley and Sons, 5, pp.631-634.

[10] Dagum, E.B. (1988), The X11ARIMA/88 Seasonal Adjustment Method- Foundation and User’s Manual, Research Paper, Time Series Researchand Analysis Division, Statistics Canada, Ottawa.

[11] Dagum, E.B. (1996), A New Method to Reduce Unwanted Ripples andRevisions in Trend-Cycle Estimates from X11ARIMA, Survey Method-ology, 22, 77-83.

[12] Dagum, E.B. and Capitanio, A. (1998), Smoothing Methods for Short-Term Trend Analysis: Cubic Splines and Henderson Filters, Statistica,LVIII, 1, pp. 5-24.

[13] Dagum, E.B. and Laniel, N. (1987), Revisions of Trend-Cycle Estimatorsof Moving Average Seasonal Adjustment Methods, Journal of Businessand Economic Statistics, 5, 177-189.

[14] Dagum, E.B. and Luati, A. (2004), Relationship Between Local andGlobal Nonparametric Estimators Measures of Fitting and Smoothing,Studies in Nonlinear Dynamics and Econometrics, 8, article 17.

[15] Dalton, P. and Keogh, G. (2000), An Experimental Indicator to ForecastTurning Points in the Irish Business Cycle, Journal of the Statistical andSocial Inquiry Society of Ireland, Vol. XXIX, pp. 117 - 176.

[16] DeBoor, C. and Lynch, R. (1966), On Splines and Their Minimum Prop-erties, Journal of Math. Mech., 15, pp. 953 - 969.

[17] Doherty, M. (1992), The Surrogate Henderson Filters in X11, WorkingPaper, Statistics New Zealand, Wellington, New Zealand.

[18] Findley, D., Monsell, B, Bell, W., Otto, M. and Chen, B. (1998), NewCapabilities and Methods of the X12ARIMA Seasonal Adjustment Pro-gram, Journal of Business Economic Statistics, 16, pp. 127-152.

[19] Gray, A. and Thomson, P. (1996a), On a Family of Moving-AverageTrend Filters for the Ends of Series, Proceedings of the Business andEconomic Statistics Section, American Statistical Association AnnualMeeting, Chicago.

22

[20] Gray, A. and Thomson, P. (1996b), Design of Moving-Average TrendFilters Using Fidelity and Smoothness Criteria, in Robinson, P.M. andRosemblatt, M. (eds), ”Time Series Analysis” (in memory of E.J. Han-nan), vol.II, Springer Lecture Notes in Statistics, 115, New York, pp.205-219.

[21] Henderson, R. (1916), Note on Graduation by Adjusted Average, Trans-action of Actuarial Society of America, 17, pp. 43-48.

[22] Kendall, M.G., Stuart, A. and Ord, J. (1983), The Advanced Theory ofStatistics, Vol. 3, Ed. C. Griffin.

[23] Kenny, P. and Durbin (1982), Local Trend Estimation and SeasonalAdjustment of Economic and Social Time Series, Journal of the RoyalStatistical Society A, 145, 1-41.

[24] Kimeldorf, G. and Wahba, G. (1970a), A Correspondence BetweenBayesian Estimation on Stochastic Process and Smoothing Splines, An-nals of Mathematical Statistics, 41, pp. 495 - 502.

[25] Kimeldorf, G. and Wahba, G. (1970b), Splines Functions and StochasticProcesses, Sankhya Series A, 32, pp. 173 - 180.

[26] Laniel, N. (1985), Design Criteria for the 13-term Henderson EndWeights, Working Paper TSRA 1986-011, Statistics Canada.

[27] Ladiray, D. and Quenneville, B. (2001), Seasonal Adjustment with theX11 Method, Springer, Lecture Notes in Statistics, 158, New York.

[28] Loader, C. (1999), Local Regression and Likelihood, Springer, New York.

[29] Loeve, M. (1948), Fonctions aleatories du second ordre, Appendix toLevy, P., ”Stochastic Processes and Brownian Motion”, ed. Gauthier-Villars, Paris.

[30] Muller, H.G. (1984), Smooth Optimum Kernel Estimators of RegressionCurves, Densities and Modes, Annals of Statistics, 12, pp. 766-774.

[31] Musgrave, J. (1964a), A Set of End Weights to End All End Weights,Working Paper, US Bureau of the Census, Washington.

23

[32] Musgrave, J. (1964b), Alternative Sets of Weights for Proposed X-11Seasonal Factor Curve Moving Averages, Working Paper, US Bureau ofthe Census, Washington.

[33] Parzen, E. (1959), Statistical Inference on Time Series by Hilbert SpaceMethods, Technical Report No. 53, Statistics Department, Stanford Uni-versity, Stanford, CA.

[34] Quenneville, B., Ladiray, D. and Lefrancois, B. (2003), A Note on Mus-grave Asymmetrical Trend-Cycle Filters, International Journal of Fore-casting, 19, 4, pp. 727-734.

[35] Shiskin, J., Young, A. and Musgrave, J. (1967), The X-11 Variant ofthe Census Method II Seasonal Adjustment Program, Technical Paper15, US Department of Commerce, Bureau of the Census, WashingtonDC.

24

Documents

The Henderson Smoother in Reproducing Kernel Hilbert Spaceplaza.ufl.edu/yiz21cn/refer/HendersonFinalDraft.pdf · ing to which a kernel estimator of order p ≥ 2 can always be decomposed