[Smith, Kohn, 2000] Nonparametric Seemingly Unrelated Regression

Embed Size (px)

Citation preview

  • 8/13/2019 [Smith, Kohn, 2000] Nonparametric Seemingly Unrelated Regression

    1/25

    *Corresponding author. Tel.:#612-9351-2787; fax:#612-9351-6409.

    E-mail address:[email protected] (M. Smith).

    Journal of Econometrics 98 (2000) 257}281

    Nonparametric seemingly unrelated regression

    Michael Smith*, Robert Kohn

    Econometrics and Business Statistics, University of Sydney, Sydney, Australia

    Australian Graduate School of Management, University of New South Wales, Kensington, New SouthWales, Australia

    Received 19 March 1998; received in revised form 9 February 2000; accepted 13 March 2000

    Abstract

    A method is presented for simultaneously estimating a system of nonparametric

    regressions which may seem unrelated, but where the errors are potentially correlatedbetween equations. We show that the advantage of estimating such a &seemingly unre-

    lated' system of nonparametric regressions is that less observations can be required to

    obtain reliable function estimates than if each of the regression equations is estimated

    separately and the correlation ignored. This increase in e$ciency is investigated empiric-

    ally using both simulated and real data. The method uses a Bayesian hierarchical

    framework where each regression function is represented as a linear combination of

    a large number of basis terms. All the regression coe$cients, and the variance matrix of

    the errors, are estimated simultaneously by their posterior means. The computation is

    carried out using a Markov chain Monte Carlo sampling scheme that employs a &focused

    sampling' step to combat the high-dimensional representation of the unknown regression

    functions. The methodology extends easily to other nonparametric multivariate regres-

    sion models. 2000 Elsevier Science S.A. All rights reserved.

    JEL classixcation: C11; C14; C15; C31

    Keywords: Nonparametric multivariate regression; Bayesian hierarchical SUR model;

    Multivariate subset selection; Markov Chain Monte Carlo

    0304-4076/00/$ - see front matter 2000 Elsevier Science S.A. All rights reserved.

    PII: S 0 3 0 4 - 4 0 7 6 ( 0 0 ) 0 0 0 1 8 - X

  • 8/13/2019 [Smith, Kohn, 2000] Nonparametric Seemingly Unrelated Regression

    2/25

    1. Introduction

    The aim of nonparametric regression is to estimate regression functions

    without assuming a priori knowledge of their functional forms. The price for this#exibility is that appreciably larger sample sizes are required to obtain reliable

    nonparametric estimators than for parametric estimators. In this paper, we

    consider a system of regression equations that can seem unrelated, but actually

    are because their errors are correlated. Such a system of equations is called a set

    of&seemingly unrelated'regressions, or a SUR model (Zellner, 1962). This paper

    provides a Bayesian framework for reliably estimating the regression functions

    in a nonparametric manner, even for moderate sample sizes, by taking advant-

    age of the correlation structure in the errors. The most important consequenceof this work is to show that if the errors are correlated, better nonparametric

    estimators are obtained by taking advantage of this correlation structure com-

    pared to ignoring the correlation and estimating the equations one at a time.

    Speci"cally, we consider the system ofm regression equations

    y"f(x)#e, for i"1,2,2,m. (1.1)

    Here, the superscript denotes that this is the ith ofmpossible regressions, y is

    the dependent variable,xis a vector of independent variables and f

    ,2,f

    arefunctions that require estimating in a nonparametric manner. As in the linear

    Gaussian SUR model, the regressions are related through the correlation

    structure of the Gaussian errors e. That is,

    e&N(0, I

    ), (1.2)

    wheree"(e, e,2, e), eis the vector of errors for thenobservations of theith regression and is a positive-de"nite (mm) matrix that also requires

    estimation. This paper provides a data-driven procedure for estimating theunknown functions f (for i"1,2,m) and covariance matrix in this model.

    Such systems of regressions are frequently used in econometric, "nancial and

    sociological modeling because taking into account the correlation structure in

    the errors results in more e$cient estimates than ignoring the correlation and

    estimating the equations one at a time. Most of the literature on estimating

    a system of equations assumes that the f are linear functions. For recent

    examples, see Bartels et al. (1996), Min and Zellner (1993) and Mandy and

    Martins-Filho (1993). However, in practice the functional forms of the f in

    many regression applications are unknown a priori, so that an approach that

    estimates their form is preferable. We examine two such cases here. The "rst

    concerns print advertisements in a women's magazine and estimates the rela-

    tionship between three measures of advertising exposure and the physical

    positioning of advertisements in the magazine. The second involves estimating

    an intra-day model for average electricity load in two adjacent Australian states.

    In this example, we estimate the daily and weekly periodic components of load,

    258 M. Smith, R. Kohn/Journal of Econometrics 98 (2000) 257}281

  • 8/13/2019 [Smith, Kohn, 2000] Nonparametric Seemingly Unrelated Regression

    3/25

    along with a temperature e!ect. In both examples, signi"cant nonlinear relation-

    ships are identi"ed that are di$cult to discern using a parametric SUR ap-

    proach. In addition, substantial correlation is estimated between the regressions

    and the function estimates di!er substantially from those obtained by estimatingeach of the nonparametric regressions separately and ignoring the correlation

    between the equations.

    Our approach for estimating the system of equations de"ned at (1.1) and (1.2)

    models each of the functions f as a linear combination of basis terms. We

    develop a Bayesian hierarchical model to explicitly parameterize the possibility

    that these terms may be super#uous and have corresponding coe$cient values

    that are exactly zero. A wide variety of bases can be used, including many with

    a desired structure, such as periodicity or additivity, a point which is demon-strated in the empirical examples. The unknown regression functions are esti-

    mated by their posterior means which attach the proper posterior probability to

    each subset of the basis elements, providing a nonparametric estimate that is

    both#exible and smooth. We develop a Markov chain Monte Carlo (MCMC)

    sampling scheme to calculate the posterior means because direct evaluation is

    intractable. This sampling scheme is a correction of the &focused sampler'

    discussed in Wong et al. (1997) and our empirical work shows it to be reliable

    and much more e$cient than the Gibbs sampling alternative. We prove that theiterates of the focused sampler converge to the correct posterior distribution.

    The performance of the new estimator is investigated empirically with a set of

    simulation experiments that cover a range of potential regression curves. These

    demonstrate the improvement that can be obtained by exploiting the correla-

    tion structure in a system of regressions. We note that the solution to the

    nonparametric SUR model discussed in this paper is easily extended to other

    nonparametric multivariate (or vector) regression models.

    Zellner (1962, 1963) provides the seminal analysis of a system of regressionswhen the unknown functionsfare assumed linear in the coe$cients. Srivastava

    and Giles (1987) summarize much of the literature dealing with this linear SUR

    model. However, recent advances in Markov chain Monte Carlo methods

    enable Bayesian analyses of more complex variations of the SUR model. For

    example, Chib and Greenberg (1995a) develop sampling schemes that estimate

    a hierarchical linear SUR model with"rst-order vector autoregressive or vector

    moving average errors and extend the analysis to a time varying parameter

    model. Markov chain Monte Carlo methods also provide a solution to estima-

    ting reliably nonparametric regressions in a variety of hitherto di$cult situ-

    ations. For example, Smith and Kohn (1996) develop nonparametric regression

    estimators for regression models where a data transformation may be required

    and/or outliers may exist in the data. Yee and Wild (1996) use smoothing

    splines to estimate a system of equations in a nonparametric manner, but they

    do not have data-driven estimators for the smoothing parameters. In the

    example in their paper they use values of the smoothing parameters based on the

    M. Smith, R. Kohn/Journal of Econometrics 98 (2000) 257}281 259

  • 8/13/2019 [Smith, Kohn, 2000] Nonparametric Seemingly Unrelated Regression

    4/25

    independent variables, but not the dependent variable. Such an approach is an

    unsatisfactory way of estimating the smoothing parameters because it does not

    take into account the curvature exhibited by the dependent variable. Nor is it

    fully automatic because the e!ective degrees of freedom is required as an inputfrom the user.

    The paper is organized as follows. Section 2.1 discusses how to model the

    unknown functions and why they are estimated using a hierarchical model. The

    rest of Section 2 introduces the Bayesian hierarchical SUR model and develops

    an e$cient MCMC sampling scheme to enable its estimation. Section 3 uses the

    methodology to"t the print advertising and electricity load datasets. Section 4

    contains simulation examples which investigate the improvements that can be

    made using this estimation procedure over a series of separate nonparametricregressions. Appendix A provides the conditional posterior distributions em-

    ployed in the sampling scheme, while Appendix B proves that the focused

    sampling step provides an iterate from the correct invariant distribution.

    2. Methodology

    2.1. Basis representation of functions

    Each regression function is modeled as a linear combination of basis func-

    tions, so that for a function f,

    f(x)"

    b(x). (2.1)

    Here, B"b

    ,2, b is a basis of p functions, while the

    's are regression

    parameters.A large number of authors use such linear decompositions with a variety of

    univariate and higher dimensional bases in the single equation case. For

    example, Friedman (1991), Smith and Kohn (1996) and Denison et al. (1998) use

    regression splines, Luo and Wahba (1997) use several reproducing kernel bases

    and Wahba (1990) uses natural splines. In particular, orthonormal bases, such as

    wavelet (Donoho and Johnstone, 1994) or Fourier bases have been used.

    However, the computational advantage provided by such orthonormal bases

    does not easily extend to the case where the errors are correlated, such as in theSUR model. In the case of multiple regressors in an equation, additive models of

    univariate bases or radial bases (Powell, 1987; Holmes and Mallick, 1998) can be

    used.

    Given a choice of a particular basis for the approximation at (2.1), the ith

    regression at (1.1) can be written as the linear model

    y"X#e. (2.2)

    260 M. Smith, R. Kohn/Journal of Econometrics 98 (2000) 257}281

  • 8/13/2019 [Smith, Kohn, 2000] Nonparametric Seemingly Unrelated Regression

    5/25

    Here,yis the vector of then observations of the dependent variable, the design

    matrix X"[bb2b], b is a vector of the values of the basis function

    b

    evaluated at the n observations and are the regression coe$cients. The

    errors eare correlated with those from the other regressions, as speci"ed in (1.2),and we denote the number of basis terms in the ith equation as p. Note that

    many basis expansions employ p*n basis terms and it is inappropriate to

    estimate the regression coe$cients using existing SUR methodology because the

    function estimates fK, i"1,2, m, would interpolate the data (rather than pro-

    duce smooth estimates that account for the existence of noise in the regression).

    Therefore, we estimate the regression parameters using a Bayesian hierarchical

    SUR model described below that explicitly accounts for the possibility that

    many of these terms may be redundant.

    2.2. A Bayesian hierarchical SUR model

    Consider theith regression of a linear SUR model given at Eq. (2.2), where the

    design matrixXis (np) and the coe$cient vector is of lengthp. To accountexplicitly for the notion that variables in this regression can be redundant, we

    introduce a vector of binary indicator variables "(

    ,

    ,2, ). Here,

    corresponds to thekth element of the coe$cient vector of theith regression,say

    , with

    "0 if

    "0 and

    "1 if

    O0. By dropping the redundant

    terms with zero coe$cients, the ith regression can be rewritten, conditional on ,

    as

    y"X#e. (2.3)

    Ifq"

    , then the design matrixX is of size (nq) and is a vector of

    q elements.

    By stacking the linear models for them

    regressions, the SUR model can bewritten, conditional on "(, ,2, ), as

    y"X#e. (2.4)

    Here,y"(y, y,2, y), X"diag(X , X ,2,X) and"( ,2, ).Ifq"q , thenX is an (nq ) matrix and a vector ofq elements. Tocomplete this Bayesian hierarchical model, we introduce the following priors on

    the parameters.

    (i) Following O'Hagan (1995) we construct a conditional prior for by setting

    p( ,)Jp(y ,,)

    so that , &N(( , n(XAX )), where A"I and( "(XAX)XAy. This data-based fractional prior contains much lessinformation about than the likelihood. The results obtained using thisprior are similar to those obtained using the prior , &N(0, nI), which

    M. Smith, R. Kohn/Journal of Econometrics 98 (2000) 257}281 261

  • 8/13/2019 [Smith, Kohn, 2000] Nonparametric Seemingly Unrelated Regression

    6/25

    does not depend on the data. However, we prefer the data-based prior

    because the conditional posterior mean of is unbiased.(ii) The prior for is taken as independent of and is the commonly used

    non-informative prior discussed in Zellner (1971), wherep()J.(iii) The indicator variables

    , k"1,2, p, i"1,2, m, are taken a priori

    independently distributed, with p("1)".

    (iv) The hyperparameters, i"1,2, m, are taken as independent and givena non-informative uniform prior on (0, 1).

    We integrate the hyperparameters"(,2,) out of our analysis, so that

    p()"p()p() d"

    (

    )

    (1!)

    d"

    B(q

    #1, p

    !q

    #1), where B is the beta function. This is a di!erent prior on than that

    suggested in Smith and Kohn (1996). Note that the model here is a hierarchical

    SUR model as, conditional on , it is simply a linear SUR model.

    2.3. Markov chain Monte Carlo sampling

    To estimate this model we use the following Markov chain Monte Carlo

    sampling scheme with the hyperparameter integrated out.

    (1) Generate from , , y(2) Generate from, , y" , , y(3) For i"1, 2,2, m

    For j"1, 2,2, p

    Generate from,

    , yusing the sampling step described below.

    In this sampling scheme is generated from a multivariate normal. Generationof the matrix directly from the posterior at step (2) is di$cult because the

    fractional prior , is centered at ( , which is a function of. Conse-quently, we use a Metropolis}Hastings step where the proposal Wishart density

    is the posterior under a #at conditional prior for . This works well withbetween 60% and 90% of those iterates that are generated being accepted.

    Details of how to generate from the distributions at steps (1) and (2) are given in

    Appendix A. It is important to note that care has been taken to generate

    without conditioning on

    at step (3), otherwise the sampling scheme would

    be reducible because

    is known exactly given.

    Step (3) generates an iterate of one element at a time. As discussed in Wong

    et al. (1997), using a Gibbs sampler is computationally demanding because has

    p elements, with generation from each conditional posterior density

    a computationally intensive exercise. To speed up the generation we use the

    following &sampling' step, which is an application of the Metropolis}Hasting

    262 M. Smith, R. Kohn/Journal of Econometrics 98 (2000) 257}281

  • 8/13/2019 [Smith, Kohn, 2000] Nonparametric Seemingly Unrelated Regression

    7/25

    algorithm. Let()"p(

    ,

    , y) be the conditional posterior density of

    ands(

    )"p(

    ) be its conditional prior density, with integrated out in

    both cases. Then, if is the previous value of, a new value can be

    generated as follows.

    Sampling Step

    (a) Proposal

    If"1 then generate from the proposal density

    Q("1P"0)"s("0)min

    1,(

    "0)

    s("0)

    If"0 then generate from the proposal density

    Q("0P"1)"s("1)min1,

    ("1)

    s("1)

    (b) Metropolis}Hastings acceptance probabilities

    If "0 and "1, then accept with probability"min1, s(

    "0)/(

    "0); otherwise set"0.

    If "1 and "0, then accept with probability"min1, s(

    "1)/(

    "1); otherwise set"1.

    Appendix A calculates the posterior and prior s, with the latter beinga trivial calculation. Generating from the proposal in part (a) can be undertakene$ciently as follows.

    First generate u from a uniform distribution on (0, 1), then

    (i) if "1 and u(s("1), set"1.

    (ii) if "1 and u's("1), generate from the density

    p("0)"min(1,("0)/s(

    "0)).

    (iii) if"0 and u(s("0), then set"0.

    (iv) if "0 and u's("0), generate from the density

    p("1)"min(1,("1)/s(

    "1)).

    Generation is e$cient because in the nonparametric regression problem most

    of the indicators will be zero ands("0) will be close to one. Hence, most of the

    time"0 and u(s("0), so that case (iii) is undertaken most frequently

    and is calculated infrequently.

    M. Smith, R. Kohn/Journal of Econometrics 98 (2000) 257}281 263

  • 8/13/2019 [Smith, Kohn, 2000] Nonparametric Seemingly Unrelated Regression

    8/25

    Appendix B proves that the sampling method described above is a correct

    application of the Metropolis}Hastings method and therefore that

    ()"p(

    ,

    , y) is the invariant distribution of each step. The appendix

    also includes a lemma demonstrating that the Metropolis}Hastings ratios atpart (b) of the sampling step will usually be one, or close to one. Because steps

    (1)}(3) of the sampling scheme either generate directly from the conditional

    posterior distributions, or use a Metropolis}Hastings step, the scheme con-

    verges to its invariant distribution, which is the posterior distribution

    , , y (Tierney, 1994).The sampling scheme is much faster because step (3) involves a much smaller

    number of complex calculations than the full Gibbs sampler. Moreover, focus-

    ing the generations is especially important in nonparametric SUR modelscompared to a single equation regression, because there are more basis terms.

    We have found this sampler to have strong empirical convergence properties,

    a point that is demonstrated in the examples in Sections 3 and 4. These sections

    also compare the sampler to the Gibbs alternative and demonstrate that it is

    more computationally e$cient.

    A sampler that generates solely from the parameter space of is not con-

    sidered as it is di$cult to calculate the posterior distribution

    , y. Similarly,

    samplers that generate from either the parameter space of (,

    ) or (, ) arenot considered because it is di$cult to recognize the conditional posterior

    distribution, y, or calculate

    , , y.

    2.4. Parameter estimation

    Given an initial state for the Markov chain and a &warmup period', after

    which the sampler is assumed to have converged to the joint posterior distribu-

    tion, we can collect iterates (

    ,

    ,

    ),2, (

    ,

    ,

    ) which forma Monte Carlo sample from the joint posterior distribution. It is this sample that

    is used for inference.

    The posterior mean E[y] is estimated by the histogram estimate K"((1/J)

    ). We do not use a mixture estimate because the distribution

    of , , y is di$cult to identify (which is also the reason a Metropolis}Hastings step is used at step (2) of the sampler).

    The posterior mean of the regression parameters, E[ y], is estimated using

    the mixture estimate

    K"1

    J

    E[, , y] (2.5)

    Each of the conditional expectations in the sum is simple to calculate because

    E[ , , y]"( , while elements ofthat are not common to are set tozero.

    264 M. Smith, R. Kohn/Journal of Econometrics 98 (2000) 257}281

  • 8/13/2019 [Smith, Kohn, 2000] Nonparametric Seemingly Unrelated Regression

    9/25

  • 8/13/2019 [Smith, Kohn, 2000] Nonparametric Seemingly Unrelated Regression

    10/25

    3.1. Print advertising data

    We demonstrate our procedure using n"457 observations of data from six

    issues of an Australian monthly women's magazine collected by Starch INRAHooper. Each observation corresponds to an advertisement placed in the

    magazine and the following three advertisement exposure scores, which are

    recorded from an experimental audience, are used as measures of the various

    levels of e!ectiveness of the print advertisement.

    y (noted score): Proportion of respondents who claim to recognize the ad as

    having been seen by them in that issue.

    y

    (associated score): Proportion of the respondents who claim to havenoticed the advertiser's brand or company name or logo.

    y (read-most score): Proportion of respondents who claim to have read

    half or more of the copy.

    These scores are considered to measure advertisement exposure at increasing

    levels of depth. It has long been thought that the positioning of an advertisement

    within an issue a!ects its exposure to an audience (Hanssens and Weitz, 1980).

    To quantify this we constructed the variable P"(page number)/(number ofpages in issue) to represent the position in the issue in which each advertisement

    appeared.

    To estimate how the exposure of a print advertisement is a!ected by its

    position in the magazine, we considered the three nonparametric regressions

    y"f(P)#e for i"1, 2, 3. (3.1)

    To model each unknown function we use a thin plate spline basis (Powell, 1987),with basis terms b(x)"x!k

    log(x!k

    ), j"1,2, n, where k ,2, k are

    the so-called &knots'which we set equal to the nobservations of the independent

    variable. Following previous authors (Wahba, 1990) we also augment these

    basis terms with an intercept and linear term, so that p"p"p"459.

    Expected features in the functionsfinclude high casual attention to advertise-

    ments placed in the front (and to a lesser extent back) of the magazine, while the

    pre-editorial slots (where P is about 0.7) are thought to attract more indepth

    attention. The three scoresy, yandyare highly positively correlated and it is

    likely that the errors are correlated, so a SUR model appears appropriate.

    In a parametric SUR model with the same independent variables the general-

    ized and ordinary least-squares estimate of the regression coe$cients are the

    same. However, this result does not extend to the case where variable selection is

    undertaken on the terms in the regressions. In particular, this applies to the

    nonparametric function estimators used in this paper as they are based on

    variable selection and model averaging applied to linear basis decompositions.

    266 M. Smith, R. Kohn/Journal of Econometrics 98 (2000) 257}281

  • 8/13/2019 [Smith, Kohn, 2000] Nonparametric Seemingly Unrelated Regression

    11/25

    Fig. 1. (a)}(c) Function estimates of f, f and f, respectively. Bold lines are the nonparametric

    SUR estimates, while the dashed lines are the single equation estimates. Panels (d)}(f) contain scatterplots ofPversus the standardized uncorrelated residuals resulting from the nonparametric SUR "t.

    The equations at (3.1) were estimated both as a SUR system and individually

    with the analogous single equation estimator that ignores any correlation

    between the equations. The resulting function estimates are plotted in

    Figs. 1(a)}(c) and it can be seen that the SUR and equation-by-equationestimates di!er substantially. The estimate K from the SUR model is given

    below (correlations in italics)

    K"2.16610 2.10310 1.205100.956 2.23710 1.275100.822 0.856 0.99210

    con"rming the existence of high correlation in the errors. The SUR function

    estimates suggest that the front (and to a lesser extent) back of the magazine are

    areas in which advertisements achieve higher average exposure; though this is

    more prominent for the noted and associated scores, y, y, than for the read-

    most score y. The pre-editorial slots also result in increased exposure, with

    a particularly positive e!ect on indepth exposure, as measured by y. The

    relationships are distinctly nonlinear and would be hard to discern a priori using

    parametric SUR estimation. To help con"rm that the nonparameteric SUR

    M. Smith, R. Kohn/Journal of Econometrics 98 (2000) 257}281 267

  • 8/13/2019 [Smith, Kohn, 2000] Nonparametric Seemingly Unrelated Regression

    12/25

    Table 1

    Ratio of variances for the proposed sampler over the Gibbs sampler for the function estimates off,

    f and f calculated at four points on the domain ofP

    Eq. (1) Eq. (2) Eq. (3)

    P"0.2 2.028 3.027 2.195

    P"0.4 6.564 4.684 6.484

    P"0.6 2.546 3.407 7.102

    P"0.8 4.174 7.634 6.660

    (NSUR) estimates correctly capture the nonlinear relationships between

    y, y, y and P, we calculated the standardized uncorrelated residuals

    r( , i"1, 2, 3. Figs. 1(d)}(f) plot these residuals againstP, and these con"rm that

    there is no signi"cant nonlinear relationship between the residuals and P.

    We also estimated the NSUR model using the sampling scheme outlined in

    Section 2.3, but using a Gibbs sampler where the elementswere generated at

    step (3) directly from their posteriors. With both sampling schemes, we used

    a warmup period of 2000 iterations and a further 1000 iterates for inference,resulting in 45933000 generations from the conditional posterior at (A.1). Incomparison, the proposed sampler took only 91,511 computationally equivalent

    generations, 2.2% of the number required by the Gibbs sampler. Because the

    Gibbs sampler involves more generations, it would be expected to result in more

    random iterates than the proposed sampler. To measure this, we calculated the

    variances of the function estimates. Table 1 provides the ratio of the variance for

    the proposed sampler over that of the Gibbs sampler. It does so for function

    estimates calculated atP"

    0.2, 0.4, 0.6, 0.8 for all three regression equations. Itreveals that the proposed sampler needs to be run for two to seven times as

    many iterations as the Gibbs sampler to obtain function estimates with the same

    variances. However, this increased number of iterations still takes only 4}14%

    of the time required by the Gibbs sampler and is therefore substantially more

    e$cient overall. We note that function estimates are averages of terms in a

    stationary sequence whose variances are computed as in Box et al. (1994, p. 34).

    3.2. Intra-day electricity load model

    Short-term forecasting of regional electricity load based on intra-day data is

    an important problem for electricity utilities and electricity regulatory authori-

    ties. Load forecasts are required both to determine electricity dispatch schedules

    and to price electricity in wholesale markets. Harvey and Koopman (1993) and

    Smith (2000) consider nonparametric regressions with daily, weekly and possible

    temperature e!ects as models for electricity load. We consider estimating such

    268 M. Smith, R. Kohn/Journal of Econometrics 98 (2000) 257}281

  • 8/13/2019 [Smith, Kohn, 2000] Nonparametric Seemingly Unrelated Regression

    13/25

    a model for three weeks of half-hourly electricity load data from the adjacent

    Australian states of New South Wales (NSW) and Victoria (VIC). Let and

    be the half-hourly average electricity load of NSW and VIC, respectively,

    OD and

    O=

    be the time of day and week of the half-hourly observationsnormalized to [0, 1) and be the temperature in the NSW state capital of

    Sydney. We estimate the additive nonparametric SUR:

    "f

    (OD)#f

    (O=)#f

    ()#e,

    "f

    (OD)#f

    (O=)#e, (3.2)

    where the periodic pro"le of electricity load has been decomposed into periodicdaily (f

    ) and weekly (f

    ) e!ects for both states. An additive temperature e!ect is

    considered for NSW, but we do not include one for VIC because half-hourly

    temperature data were not available to us for this state.

    We model both f

    and f

    with a periodic quadratic reproducing basis (Luo

    and Wahba, 1997) where b(x)"((x!k

    !

    )!1/12)/2. We place the knots

    k

    at all the possible observations, so that for the time of day e!ect

    k"i/48, i"1,2, 48, and for the weekly e!ectk

    "i/336, i"1,2, 336. This

    basis was chosen because it ensures that the functions f , f , f and f are allperiodic on [0, 1). The temperature e!ect was modeled using a thin plate spline

    with knots at all observations.

    The estimated covariance is (correlation in italics)

    K"29088.13 26093.55

    0.5208 86299.88,

    where var(e)(var(e) because we control for the variation in temperature in

    the NSW equation. The positive correlation between the states is likely to be

    due, in part, to common weather e!ects and television programming. Figs. 2(a)

    and (b) plot the sum of the estimated daily and weekly e!ects for both states

    againstO=, while the estimated additive Sydney temperature e!ect is given in

    Fig. 3. Because the regressions are additive, we present the functions normalized

    so thatf(0)"0. The data were collected during March 1998 (late Summer) and

    higher temperatures result in increased load due to increased usage of air

    conditioning. Estimates of the periodic pro"le are more detailed for NSW as the

    model controls for temperature, while the periodic pro"les are more smooth for

    VIC. The functions are highly nonlinear and, as Harvey and Koopman (1993)

    highlight, are extremely di$cult to model using parametric nonlinear models.

    The two equations at (3.2) were also estimated using the analogous single

    equation estimator. Fig. 4 plots the di!erence between the SUR and single

    equation estimates of the sum of the daily and weekly periodic e!ects. The range

    M. Smith, R. Kohn/Journal of Econometrics 98 (2000) 257}281 269

  • 8/13/2019 [Smith, Kohn, 2000] Nonparametric Seemingly Unrelated Regression

    14/25

    Fig. 2. Sum of the estimated daily and weekly periodic e!ects for VIC (panel (a)) and NSW (panel

    (b)). The dashed curves are those resulting from estimation using the proposed sampler and the bold

    curves are those resulting from estimation using a full Gibbs sampler. Both estimates are plotted

    together for comparison, although they are almost identical and are hard to visually distinguish.

    of the di!erence is about 250 and 400 megawatts for VIC and NSW, respectively,

    which is substantial compared to the estimates of

    var(e) and

    var(e).

    Because the load pro"les devolve slowly during the year and are e!ectively only

    static over periods of around two to three weeks, larger sample sizes cannot be

    used and it is useful to exploit the correlation between equations.

    To investigate the performance of the sampling scheme in Section 2.3 we also

    estimated the nonparametric SUR model for the electricity load data using a full

    Gibbs sampler which generated directly from each of the

    p posterior

    distributions of the binary indicatorsat step (3) of the sampling scheme. The

    270 M. Smith, R. Kohn/Journal of Econometrics 98 (2000) 257}281

  • 8/13/2019 [Smith, Kohn, 2000] Nonparametric Seemingly Unrelated Regression

    15/25

    Fig. 3. Same as in Fig. 2, but for the estimated Sydney additive temperature e!ect fK

    .

    full Gibbs sampler was run using the same number of iterations for the warmup

    and sampling periods as the proposed sampler. Figs. 2 and 3 compare theresulting curve estimates for the Gibbs sampler (bold lines) and proposed

    sampler (dashed lines). The curve estimates are virtually identical and sometimes

    the two curve estimates are so similar that the dashed lines cannot be made out.

    It is di$cult to see how this could occur if the proposed sampler was not

    converging to the same distribution as the Gibbs sampler and the sampling

    scheme was not mixing well with respect to the indicator variables . The results

    are identical regardless of initial starting state for either sampler.

    4. Simulation experiments

    4.1. Positively correlated univariate regressions

    This simulation is concerned with the case where the errors are highly

    correlated between regressions. It highlights the improvement in the quality of

    M. Smith, R. Kohn/Journal of Econometrics 98 (2000) 257}281 271

  • 8/13/2019 [Smith, Kohn, 2000] Nonparametric Seemingly Unrelated Regression

    16/25

    Fig. 4. The di!erence between the SUR and single equation estimates of the sum of the daily and

    weekly periodic e!ects. Panel (a) is for the Victorian equation and panel (b) is for the New South

    Wales equation.

    the estimates that can be obtained when such correlation is modeled rather thanignored. There are m"4 univariate regressions, with the covariance matrix

    speci"ed below at (4.1). Note that the standard deviation of the errors

    (var(e))"1 is high compared to the range of the functions.

    Four true functions were carefully chosen to represent a wide variety of

    possible relationships. These are f(x)"sin(8x) (which is highly oscillatory),f(x)"((x, 0.2, 0.25)#(x, 0.6, 0.2))/4, with (x, a, b) being a normal density

    of meanaand standard deviation b, (which requires a locally adaptive estimator

    as there are di!erent degrees of smoothness on the left and right of the function),

    f(x)"1.5x (which was chosen as many relationships are often thought to

    be linear) and f(x)"cos(2x) (which is a smooth nonlinear function). Theindependent variables for the four univariate regressions were x&U(0, 1),

    x&U(0, 1) and

    x

    x&N0.5

    0.5, 0.31 0.6

    0.6 1

    272 M. Smith, R. Kohn/Journal of Econometrics 98 (2000) 257}281

  • 8/13/2019 [Smith, Kohn, 2000] Nonparametric Seemingly Unrelated Regression

    17/25

    We generated n"100 data points from this true SUR model and applied the

    nonparametric SUR estimator to this data. We use a quartic reproducing kernel

    basis (Luo and Wahba, 1997) with basis terms

    b(x)"

    1

    24x!x !1

    2

    !1

    2x!x !1

    2#

    7

    240 for i"1,2, n,augmented with an intercept and linear term.

    To assess the quality of the resulting estimates of the four functions, we

    calculated the mean squared di!erence between the function estimates and thetrue functions. This measure of distance between the two is de"ned as

    MSD"

    1

    200

    (fK(z

    )!f(z

    )),

    where min(x)"z(z

    (2(z

    "max(x) is an evenly spaced grid over

    the domain ofx. For the same data we also "t an analogous single equation

    univariate nonparametric estimator to the four regressions. The mean squareddi!erence was also calculated for each of these four function estimates.

    The entire process was repeated one hundred times. Figs. 5(a)}(d) give box-

    plots of the one hundred resulting values of log(MSD) for each of the four

    functions (i"1, 2, 3, 4) and for both the nonparametric SUR estimator

    (NSUR) and individual nonparametric estimators (NR). Fig. 5 shows that

    taking into account the correlation between the errors has substantially and

    consistently improved the resulting estimates of all the regression functions.

    To examine the qualitative improvement that occurs, we focus on the singledata set corresponding to the 50th sorted value of

    MSD

    for the non-

    parametric SUR estimator. This data set can be regarded as providing a &typical'

    example of the procedure and is plotted as four scatter plots in Figs. 5(e)}(h) and

    again in Figs. 5(i)}(l). The nonparametric SUR estimates of the four functions for

    this data appear in Figs. 5(e)}(h) and the estimates for the separate nonparamet-

    ric regressions appear in Figs. 5(i)}(l). These "gures show that the nonparametric

    SUR estimator signi"cantly outperforms the separate nonparametric estimators

    which ignore the correlation between the separate regressions. The variance ofthe errors and its estimate K for this data set are given below.

    "1 0.96 0.64 0.93

    1 0.98 0.90

    1 0.85

    1 K"0.824 0.707 0.489 0.741

    0.845 0.849 0.674 0.755

    0.570 0.773 0.895 0.656

    0.850 0.854 0.722 0.922 (4.1)

    M. Smith, R. Kohn/Journal of Econometrics 98 (2000) 257}281 273

  • 8/13/2019 [Smith, Kohn, 2000] Nonparametric Seemingly Unrelated Regression

    18/25

    Fig. 5. (a)}(d) Boxplots of the log(MSD) fori"1, 2, 3 and 4, respectively. The left-hand boxplot is

    for the NSUR estimator, while the right-hand boxplot is for the NR estimation procedure. Panels

    (e)}(h) contain scatter plots ofxagainsty, along with the function estimatesfK(bold line) and true

    functions f (dotted line) for i"1, 2, 3, 4 that result from applying the NSUR estimator. Panels

    (i)}(l) plot the function estimatesfK(bold line) and true functionsf(dotted line) fori"1, 2, 3, 4 that

    result from applying the NR procedure to the same data.

    274 M. Smith, R. Kohn/Journal of Econometrics 98 (2000) 257}281

  • 8/13/2019 [Smith, Kohn, 2000] Nonparametric Seemingly Unrelated Regression

    19/25

    Table 2

    Time taken by the NSUR procedure to complete a "t to data generated form the model in Section

    4.1 for four di!erent sample sizes. In the sampling procedure 2000 iterates were used for the warmup

    and the Monte Carlo sample consisted of a subsequent 1000 iterates. The machine used was a low

    end workstation

    Sample size n"100 n"200 n"400 n"800

    Time (s) 43 58 213 3850

    The estimate compares favorably to the&best possible'estimate K

    that arises

    from the sample variance of the true errors themselves, which are knownbecause this is a simulated example.

    K"varY((e, e, e, e))"

    0.842 0.702 0.449 0.698

    0.821 0.869 0.673 0.732

    0.522 0.770 0.879 0.603

    0.823 0.850 0.697 0.854.To demonstrate that the NSUR estimator is practical to implement, we reportthe time required to"t this four equation system for a variety of sample sizes in

    Table 2. The computer used was a low-end modern workstation while the code

    was written e$ciently in FORTRAN. Although these timings are implementa-

    tion dependent, they do indicate that this computationally intensive Markov

    chain Monte Carlo procedure runs in a reasonable time.

    4.2. Unrelated regressions

    The previous simulation considers a related set of regressions and demon-

    strates the improvements in the regression function estimates that can occur

    when correlation between the regressions is modeled and estimated, rather than

    ignored. However, is there a risk of degrading the function estimates by

    modeling correlation that does not exist?

    To investigate this case, we repeat the simulation undertaken above, except

    that the regressions are unrelated, with"0.5I

    . Fig. 6 provides the equivalent

    output for this case as was produced in Fig. 5. It can be seen from the boxplots in

    Figs. 6(a)}(d) that, in general, there is a slight deterioration in the log(MSD) for

    the NSUR estimator compared to the NR estimation procedure. This is ex-

    pected as the regressions are actually not related and the NSUR procedure also

    estimates. However, the loss in the e$ciency of the function estimates is very

    small and the function estimates from the NSUR estimator (Figs. 6(e)}(h)) are

    almost identical to those from the NR procedure (Figs. 6(i)}(l)) for the median

    dataset.

    M. Smith, R. Kohn/Journal of Econometrics 98 (2000) 257}281 275

  • 8/13/2019 [Smith, Kohn, 2000] Nonparametric Seemingly Unrelated Regression

    20/25

    Fig. 6. As in Fig. 5, but for the uncorrelated model in Section 4.2.

    276 M. Smith, R. Kohn/Journal of Econometrics 98 (2000) 257}281

  • 8/13/2019 [Smith, Kohn, 2000] Nonparametric Seemingly Unrelated Regression

    21/25

    Nevertheless, in examples whenm is large, reliably estimating them(m#1)/2

    free parameters ofin addition to the unknown functions may prove di$cult.

    In such cases, a more parsimonious representaton for can be used and

    estimated in combination with the unknown functions using MCMC. Forexample, Shephard and Pitt (1998) discuss the MCMC estimation of factor

    decompositions for , while Smith and Kohn (1999) examine parsimonious

    representations for based on the Cholesky decomposition of.

    Acknowledgements

    The authors would like to thank Denzil Fiebig, Paul Kofman, Steve Marron,Gael Martin and Tom Smith for useful comments. They would also like to

    thank two anonymous referees for improving the clarity of the presentation and

    for pointing out an error in a previous implementation of the sampling scheme.

    Both Michael Smith and Robert Kohn are grateful for the support of Australian

    Research Council grants.

    Appendix A

    A.1. Generating from , , y

    This conditional distribution can be calculated exactly, as

    p( , , y)Jp(y , , )p( , )

    Jexp!1

    2n#1

    n (!( )XAX (!( ),so that

    , , y&N( , n

    n#1(XAX ).

    Here,( and A are de"ned in Section 2.2.

    A.2. Generating from , , y

    This conditional distribution is di$cult to recognize as is embedded

    in the conditional prior for . Therefore, to obtain an iterate we use aMetropolis}Hastings step; see Chib and Greenberg (1995b) for an introduction

    M. Smith, R. Kohn/Journal of Econometrics 98 (2000) 257}281 277

  • 8/13/2019 [Smith, Kohn, 2000] Nonparametric Seemingly Unrelated Regression

    22/25

    to this method. The proposal density from which we generate a candidate iterate

    is given by

    q(

    )J

    p(y ,

    , )p(

    )

    Jexp!1

    2tr()

    which is a Wishart(, n, m) density. Here, is an (mm) matrix with ijthelement

    "(y!X)(y!X). A newly generated iterate is ac-

    cepted over the old value

    with probability

    "minp(

    , , y)q( )

    p(

    , , y)q(), 1"min

    p( , )p( , )

    , 1.High acceptance rates of 60}90% are obtained because the proposal densityq( ) )

    is equal to the correct conditional density except for the factor p( , ).

    A.3. Calculating,

    , y

    This density requires calculation to enable generation of

    in the sampling

    scheme.

    p(,

    , y)Jp(y, , )p( , ) dp()

    J(n#

    1)

    exp

    !

    1

    2S(,

    )

    p(

    ) (A.1)

    where S(, )"yAy!yAX(XAX )XAyand in Eq. (A.1) the regres-sion coe$cient is integrated out using

    &N( , n

    n#1(XAX).

    The conditional posterior can then be calculated by evaluating (A.1) for "1and

    "0 and normalizing.

    A.4. Calculatingp(

    )

    The conditional prior can be calculated by integrating out the hyperprior,with p(

    )J

    ()(1!) d"B(q#1, p!q#1), so that

    278 M. Smith, R. Kohn/Journal of Econometrics 98 (2000) 257}281

  • 8/13/2019 [Smith, Kohn, 2000] Nonparametric Seemingly Unrelated Regression

    23/25

    p("1

    )"1/(1#h), where h"(p!a)/(a#1) and a"

    is the

    number of elements of

    that are one.

    Appendix B

    This appendix provides a proof that the invariant distribution of the proposed

    sampling step in Section 2.3 is()"p(

    ,

    , y).

    Proof is as follows. As is a binary variable, the proof is undertaken bycalculating the Metropolis}Hastings ratio for the two transitions

    ("0P"1) and ("1P"0) and showing that it is applied

    correctly in step (b) of the sampling step. Let Q(P) be the proposaldensity, then

    Q(1P0)"s(0)min1,(0)/s(0) and Q(0P1)"s(1)min1,(1)/s(1)

    (B.1)

    The Metropolis}Hastings ratios are

    "min1,

    (1)Q(1P0)(0)Q(0P1), "min1,

    (0)Q(0P1)(1)Q(1P0).

    Because (1)"1!(0) and s(1)"1!s(0), (1)/s(1)'1 i! (0)/s(0)(1 and(1)/s(1)(1 i!(0)/s(0)'1. Using these inequalities and substituting the pro-posal densities at (B.1) into the ratios, it can be seen that

    "min(1, s(0)/(0))

    and"min(1, s(1)/(1)). These correspond to the ratios applied in part (b) of

    the sampling step. Hence the sampling step is reversible and its invariant

    distribution is().

    Lemma.Let(0)"p(y"0,

    ,)and(1)"p(y

    "1,

    ,)be the likeli-

    hoods for"0 and

    "1. Then,

    "s(0)#s(1)min(1,(1)/ (0)) and

    "s(1)#s(0)min(1,(0)/ (1))

    (B.2)

    ProofNote that

    (1)"s(1)(1)

    s(1)(1)#s(0)(0) and (0)"

    s(0)(0)

    s(1)(1)#s(0)(0).

    Substituting these into the expressions for

    and

    found in part (b) results in

    the expressions at (B.2).

    M. Smith, R. Kohn/Journal of Econometrics 98 (2000) 257}281 279

  • 8/13/2019 [Smith, Kohn, 2000] Nonparametric Seemingly Unrelated Regression

    24/25

    The importance of this lemma is it reveals that the Metropolis}Hastings

    ratios will usually be one, or close to one. Because s(1)#s(0)"1, "1 if

    (1)/ (0)*1, while"1 if(0)/ (1)*1. Also, even if(1)/ (0)(1, because

    in the nonparametric regression problem s(0) is usually very close to 1,'s(0) is close to 1.

    References

    Bartels, R., Fiebig, D., Plumb, M., 1996. Gas or Electricity, which is cheaper?: an econometric

    approach with an application to Australian expendicture data. Energy Journal 17, 33}58.

    Box, G., Jenkins, G., Reinsel, G., 1994. Time Series Analysis, 3rd Edition. Prentice-Hall, New Jersey.Chib, S., Greenberg, E., 1995a. Hierarchical analysis of SUR models with extensions to correlated

    serial errors and time varying parameter models. Journal of Econometrics 68, 339}360.

    Chib, S., Greenberg, E., 1995b. Understanding the Metropolis}Hastings algorithm. The American

    Statisitician 49, 327}335.

    Denison, D., Mallick, B., Smith, A., 1998. Automatic bayesian curve "tting. Journal of the Royal

    Statistical Society, Series B 60, 333}350.

    Donoho, D., Johnstone, I., 1994. Ideal spatial adaptation by wavelet shrinkage. Biometrika 81,

    425}455.

    Friedman, J., 1991. Multivariate adaptive regression splines. The Annals of Statistics (with dis-

    cussion) 19, 1}141.Hanssens, D., Weitz, B., 1980. The e!ectiveness of industrial print advertisements across product

    categories. Journal of Marketing Research 17, 294}306.

    Harvey, A., Koopman, S., 1993. Forecasting hourly electricity demand using time-varying splines.

    Journal of the American Statistical Association 88 (424), 1228}1253.

    O'Hagan, A., 1995. Fractional Bayes factors for model comparison (with discussion). Journal of

    Royal Statistical Society Series B 57, 99}138.

    Holmes, C., Mallick, B., 1998. Bayesian radial basis functions of variable dimension. Neural

    Computation 10, 1217}1233.

    Luo, Z., Wahba, G., 1997. Hybrid adaptive splines. Journal of the American Statistical Association

    92, 107}116.

    Mandy, D., Martins-Filho, C., 1993. Seemingly unrelated regressions under additive heteroskedas-

    ticity: theory and share equation applications. Journal of Econometrics 58, 315}346.

    Min, C., Zellner, A., 1993. Bayesian and non-Bayesian methods for combining models and forecasts

    with applications to forecasting international growth rates. Journal of Econometrics 56, 89}118.

    Powell, M., 1987. Radial basis functions for multivariate interpolation: a review. In: Mason, J., Cox,

    M. (Eds.), Algorithms for Approximation.

    Shephard, N., Pitt, M., 1998. Analysis of time varying covariances: a factor stochastic volatility

    approach. Preprint.

    Smith, M., 2000. Modeling and short term forecasting of new south Wales electricity system load.Journal of Business and Economic Statistics, to appear.

    Smith, M., Kohn, R., 1996. Nonparametric regression via Bayesian variable selection. Journal of

    Econometrics 75 (2), 317}344.

    Smith, M., Kohn, R., 1999. Bayesian parsimonious covariance matrix estimation. Preprint.

    Srivastava, V., Giles, D., 1987. Seemingly Unrelated Regression Equations Models. Marcel Dekker,

    New York.

    Tierney, L., 1994. Markov chains for exploring posterior distributions. The Annals of Statistics 22,

    1701}1762.

    280 M. Smith, R. Kohn/Journal of Econometrics 98 (2000) 257}281

  • 8/13/2019 [Smith, Kohn, 2000] Nonparametric Seemingly Unrelated Regression

    25/25

    Wahba, G., 1990. Spline Models for Observational Data. SIAM, Philadelphia.

    Wong, F., Hansen, M., Kohn, R., Smith, M., 1997. Focused sampling and its application to

    nonparametric and robust regression. Preprint.

    Yee, T., Wild, C., 1996. Vector generalised additive models. Journal of the Royal Statistical Society,

    Series B 58, 481}493.Zellner, A., 1962. An e$cient method of estimating seemingly unrelated regression equations and

    tests for aggregation bias. Journal of the American Statistical Association 57, 500}509.

    Zellner, A., 1963. Estimators for seemingly unrelated regression equations: some exact "nite sample

    results. Journal of the American Statistical Association 58, 977}992.

    Zellner, A., 1971. An Introduction to Bayesian Inference in Econometrics. Wiley, New York.

    M. Smith, R. Kohn/Journal of Econometrics 98 (2000) 257}281 281