50
Beyond Mean Regression Thomas Kneib Lehrstuhl f¨ ur Statistik Georg-August-Universit¨ atG¨ottingen 8.3.2013 Innsbruck

Beyond Mean Regression - Universität Innsbruck€¦ · Thomas Kneib Introduction Introduction • One of the top ten reasons to become statistician (according to Friedman, Friedman

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Beyond Mean Regression - Universität Innsbruck€¦ · Thomas Kneib Introduction Introduction • One of the top ten reasons to become statistician (according to Friedman, Friedman

Beyond Mean Regression

Thomas Kneib

Lehrstuhl fur StatistikGeorg-August-Universitat Gottingen

8.3.2013 Innsbruck

Page 2: Beyond Mean Regression - Universität Innsbruck€¦ · Thomas Kneib Introduction Introduction • One of the top ten reasons to become statistician (according to Friedman, Friedman

Thomas Kneib Introduction

Introduction

• One of the top ten reasons to become statistician (according to Friedman, Friedman& Amoo):

Statisticians are mean lovers.

⇒ Focus on means in particular in regression model to reduce complexity.

• Obviously, a mean is not sufficient to fully describe a distribution.

Beyond Mean Regression 1

Page 3: Beyond Mean Regression - Universität Innsbruck€¦ · Thomas Kneib Introduction Introduction • One of the top ten reasons to become statistician (according to Friedman, Friedman

Thomas Kneib Introduction

• Usual regression models based on data (yi,zi) for a continuous response variable yand covariates z:

yi = ηi + εi,

where ηi is a regression predictor formed in terms of the covariates zi.

• Assumptions on the error term:

E(εi) = 0, Var(εi) = σ2,

orεi ∼ N(0, σ2).

Beyond Mean Regression 2

Page 4: Beyond Mean Regression - Universität Innsbruck€¦ · Thomas Kneib Introduction Introduction • One of the top ten reasons to become statistician (according to Friedman, Friedman

Thomas Kneib Introduction

• The assumptions on the error term imply the following properties of the responsedistribution

– The predictor determines the expectation of the response:

E(yi|zi) = ηi.

– Homoscedasticity of the response:

Var(yi|zi) = σ2.

– Parallel quantile curves of the response (if the errors are also normal):

Qτ(yi|zi) = ηi + zτσ.

Beyond Mean Regression 3

Page 5: Beyond Mean Regression - Universität Innsbruck€¦ · Thomas Kneib Introduction Introduction • One of the top ten reasons to become statistician (according to Friedman, Friedman

Thomas Kneib Introduction

• Why could this be problematic?

– The variance of the responses may depend on covariates (heteroscedasticity).

– Other higher order characteristics (skewness, curtosis, . . . ) of the responses maydepend on covariates.

– Generic interest in extreme observations or the complete conditional distributionof the response.

Beyond Mean Regression 4

Page 6: Beyond Mean Regression - Universität Innsbruck€¦ · Thomas Kneib Introduction Introduction • One of the top ten reasons to become statistician (according to Friedman, Friedman

Thomas Kneib Introduction

• Example: Munich rental guide (illustrative application in this talk).

– Explain the net rent for a specific flat in terms of covariates such as living area oryear of construction.

– Published to give reference intervals of usual rents for both tenants and landlords.

⇒ We are not interested in average rents but rather in an interval covering typicalrents.

050

010

0015

0020

00re

nt in

Eur

o

20 40 60 80 100 120 140 160living area

050

010

0015

0020

00re

nt in

Eur

o

1920 1940 1960 1980 2000year of construction

Beyond Mean Regression 5

Page 7: Beyond Mean Regression - Universität Innsbruck€¦ · Thomas Kneib Introduction Introduction • One of the top ten reasons to become statistician (according to Friedman, Friedman

Thomas Kneib Introduction

• Some further examples:

– Analysing childhood BMI patterns in (post-) industrialized countries, where interestis mainly on extreme forms of overweight (obesity).

– Studying covariate effects on extreme forms of malnutrition in developing countries.

– Efficiency estimation in agricultural production, where interest is on evaluatingabove-average performance of farms.

– Modelling gas flow networks, where the behavior of the network in high or lowdemand situations shall be studied.

Beyond Mean Regression 6

Page 8: Beyond Mean Regression - Universität Innsbruck€¦ · Thomas Kneib Introduction Introduction • One of the top ten reasons to become statistician (according to Friedman, Friedman

Thomas Kneib Introduction

• More flexible regression approaches considered in the following:

– Regression models for location, scale and shape.

– Quantile regression.

– Expectile regression.

Beyond Mean Regression 7

Page 9: Beyond Mean Regression - Universität Innsbruck€¦ · Thomas Kneib Introduction Introduction • One of the top ten reasons to become statistician (according to Friedman, Friedman

Thomas Kneib Introduction

• Regression models for location, scale and shape:

– Retain the assumption of a specific error distribution but allow covariate effectsnot only on the mean.

– Simplest example: Regression for mean and variance of a normal distribution where

yi = ηi1 + exp(ηi2)εi, εi ∼ N(0, 1),

such thatE(yi|zi) = ηi1 Var(yi|zi) = exp(ηi2)

2.

– In general: Specify a distribution for the response, where (potentially) allparameters are related to predictors.

Beyond Mean Regression 8

Page 10: Beyond Mean Regression - Universität Innsbruck€¦ · Thomas Kneib Introduction Introduction • One of the top ten reasons to become statistician (according to Friedman, Friedman

Thomas Kneib Introduction

• Quantile and expectile regression:

– Drop the parametric assumption for the error / response distribution and insteadestimate separate models for different asymmetries τ ∈ [0, 1]:

yi = ηiτ + εiτ ,

– Instead of assuming E(εiτ) = 0, we can for example assume

P (εiτ ≤ 0) = τ ,

i.e. the τ -quantile of the error term is zero.

– Yields a regression model for the quantiles of the response.

– A dense set of quantiles completely characterizes the conditional distribution ofthe response.

– Expectiles are a computationally attractive alternative to quantiles.

Beyond Mean Regression 9

Page 11: Beyond Mean Regression - Universität Innsbruck€¦ · Thomas Kneib Introduction Introduction • One of the top ten reasons to become statistician (according to Friedman, Friedman

Thomas Kneib Introduction

• Estimated quantile curves for the Munich rental guide with linear effect of living areaand quadratic effect for year of construction.

– Homoscedastic linear model:

−50

00

500

1000

1500

2000

rent

in E

uro

20 40 60 80 100 120 140 160living area

050

010

0015

0020

00re

nt in

Eur

o

1920 1940 1960 1980 2000year of construction

Beyond Mean Regression 10

Page 12: Beyond Mean Regression - Universität Innsbruck€¦ · Thomas Kneib Introduction Introduction • One of the top ten reasons to become statistician (according to Friedman, Friedman

Thomas Kneib Introduction

– Heteroscedastic linear model:

−50

00

500

1000

1500

2000

rent

in E

uro

20 40 60 80 100 120 140 160living area

050

010

0015

0020

00re

nt in

Eur

o

1920 1940 1960 1980 2000year of construction

Beyond Mean Regression 11

Page 13: Beyond Mean Regression - Universität Innsbruck€¦ · Thomas Kneib Introduction Introduction • One of the top ten reasons to become statistician (according to Friedman, Friedman

Thomas Kneib Introduction

– Quantile regression:

−50

00

500

1000

1500

2000

rent

in E

uro

20 40 60 80 100 120 140 160living area

050

010

0015

0020

00re

nt in

Eur

o

1920 1940 1960 1980 2000year of construction

Beyond Mean Regression 12

Page 14: Beyond Mean Regression - Universität Innsbruck€¦ · Thomas Kneib Introduction Introduction • One of the top ten reasons to become statistician (according to Friedman, Friedman

Thomas Kneib Introduction

• Usually, modern regression data contain more complex structures such that linearpredictors are not enough.

• For example, in the Munich rental guide

– the effects of living area and size of the flat may be of complex nonlinear form(instead of simply polynomial) and

– a spatial effect based on the subquarter information may be included to captureeffects of missing covariates and spatial correlation.

⇒ Consider semiparametric extensions.

Beyond Mean Regression 13

Page 15: Beyond Mean Regression - Universität Innsbruck€¦ · Thomas Kneib Introduction Introduction • One of the top ten reasons to become statistician (according to Friedman, Friedman

Thomas Kneib Overview for the Rest of the Talk

Overview for the Rest of the Talk

• Semiparametric Predictor Specifications.

• More on Models:

– Generalized Additive Models for Location, Scale and Shape.

– Quantile Regression.

– Expectile Regression.

• Inferential Procedures & Comparison of the Approaches.

Beyond Mean Regression 14

Page 16: Beyond Mean Regression - Universität Innsbruck€¦ · Thomas Kneib Introduction Introduction • One of the top ten reasons to become statistician (according to Friedman, Friedman

Thomas Kneib Semiparametric Regression

Semiparametric Regression

• Semiparametric regression provides a generic framework for flexible regression modelswith predictor

η = β0 + f1(z) + . . .+ fr(z)

where f1, . . . , fr are generic functions of the covariate vector z.

• Types of effects:

– Linear effects: f(z) = x′β.

– Nonlinear, smooth effects of continuous covariates: f(z) = f(x).

– Varying coefficients: f(z) = uf(x).

– Interaction surfaces: f(z) = f(x1, x2).

– Spatial effects: f(z) = fspat(s).

– Random effects: f(z) = bc with cluster index c.

Beyond Mean Regression 15

Page 17: Beyond Mean Regression - Universität Innsbruck€¦ · Thomas Kneib Introduction Introduction • One of the top ten reasons to become statistician (according to Friedman, Friedman

Thomas Kneib Semiparametric Regression

• Generic model description based on

– a design matrix Zj, such that the vector of function evaluations f j = (fj(z1),. . . , fj(zn))

′ can be written as

f j = Zjγj.

– a quadratic penalty term

pen(fj) = pen(γj) = γ′jKjγj

which operationalises smoothness properties of fj.

• From a Bayesian perspective, the penalty term corresponds to a multivariate Gaussianprior

p(γj) ∝ exp

(− 1

2δ2jγ′jKjγj

).

Beyond Mean Regression 16

Page 18: Beyond Mean Regression - Universität Innsbruck€¦ · Thomas Kneib Introduction Introduction • One of the top ten reasons to become statistician (according to Friedman, Friedman

Thomas Kneib Semiparametric Regression

• Estimation then relies on a penalised fit criterion, e.g.

n∑i=1

(yi − ηi)2 +

r∑j=1

λjγ′jKjγj

with smoothing parameters λj ≥ 0.

Beyond Mean Regression 17

Page 19: Beyond Mean Regression - Universität Innsbruck€¦ · Thomas Kneib Introduction Introduction • One of the top ten reasons to become statistician (according to Friedman, Friedman

Thomas Kneib Semiparametric Regression

• Example 1. Penalised splines for nonlinear effects f(x):

– Approximate f(x) in terms of a linear combination of B-spline basis functions

f(x) =∑k

γkBk(x).

– Large variability in the estimates corresponds to large differences in adjacentcoefficients yielding the penalty term

pen(γ) =∑k

(∆dγk)2 = γ′D′

dDdγ

with difference operator ∆d and difference matrix Dd of order d.

– The corresponding Bayesian prior is a random walk of order d, e.g.

γk = γk−1 + uk, γk = 2γk−1 + γk−2 + uk

with uk i. i. d. N(0, δ2).

Beyond Mean Regression 18

Page 20: Beyond Mean Regression - Universität Innsbruck€¦ · Thomas Kneib Introduction Introduction • One of the top ten reasons to become statistician (according to Friedman, Friedman

Thomas Kneib Semiparametric Regression

Beyond Mean Regression 19

Page 21: Beyond Mean Regression - Universität Innsbruck€¦ · Thomas Kneib Introduction Introduction • One of the top ten reasons to become statistician (according to Friedman, Friedman

Thomas Kneib Semiparametric Regression

• Example 2. Markov random fields for the estimation of spatial effects based onregional data:

– Estimate a separate regression coefficient γs for each region, i.e. f = Zγ with

Z[i, s] =

{1 observation i belongs to region s

0 otherwise

– Penalty term based on differences of neighboring regions:

pen(γ) =∑s

∑r∈N(s)

(γs − γr)2 = γ′Kγ

where N(s) is the set of neighbors of region s and K is an adjacency matrix.

– An equivalent Bayesian prior structure is obtained based on Gaussian Markovrandom fields.

Beyond Mean Regression 20

Page 22: Beyond Mean Regression - Universität Innsbruck€¦ · Thomas Kneib Introduction Introduction • One of the top ten reasons to become statistician (according to Friedman, Friedman

Thomas Kneib Inferential Procedures

Inferential Procedures

• For each of the three model classes discussed in the following, we will consider threepotential avenues for inference:

– Direct optimization of a fit criterion (e.g. maximum likelihood estimation forGAMLSS).

– Bayesian approaches.

– Functional gradient descent boosting.

Beyond Mean Regression 21

Page 23: Beyond Mean Regression - Universität Innsbruck€¦ · Thomas Kneib Introduction Introduction • One of the top ten reasons to become statistician (according to Friedman, Friedman

Thomas Kneib Inferential Procedures

• Functional gradient descent boosting:

– Define the estimation problem in terms of a loss function ρ (e.g. the negativelog-likelihood).

– Use the negative gradients of the loss function evaluated at the current fit as ameasure for lack of fit.

– Iteratively fit simple base-learning procedures to the negative gradients to updatethe model fit.

– Componentwise updates of only the best-fitting model component yield automaticvariable selection and model choice.

– For semiparametric regression, penalized least squares estimates provide suitablebase-learners.

Beyond Mean Regression 22

Page 24: Beyond Mean Regression - Universität Innsbruck€¦ · Thomas Kneib Introduction Introduction • One of the top ten reasons to become statistician (according to Friedman, Friedman

Thomas Kneib Generalized Additive Models for Location, Scale and Shape

Generalized Additive Models for Location, Scale and Shape

• GAMLSS provide a unified framework for semiparametric regression models in the caseof complex response distributions depending on up to four parameters (µi, σi, νi, ξi)where usually

– µi is the location parameter,

– σi is the scale parameter, and

– νi and ξi are shape parameters determining for example skewness or kurtosis.

• Each parameter is related to a regression predictor via a suitable response function,i.e.

µi = h1(ηi,µ), σi = h2(ηi,σ), . . .

Beyond Mean Regression 23

Page 25: Beyond Mean Regression - Universität Innsbruck€¦ · Thomas Kneib Introduction Introduction • One of the top ten reasons to become statistician (according to Friedman, Friedman

Thomas Kneib Generalized Additive Models for Location, Scale and Shape

• A very broad class of distributions is supported for both discrete and continuousresponses.

• Most important examples for continuous responses:

– Two-parameter normal distribution (location and scale).

– Three-parameter power exponential distribution (location, scale and kurtosis).

– Three-parameter t distribution (location, scale and degrees of freedom).

– Three-parameter gamma distribution (location, scale and shape).

– Four-parameter Box-Cox power distribution (location, scale, skewness andkurtosis).

Beyond Mean Regression 24

Page 26: Beyond Mean Regression - Universität Innsbruck€¦ · Thomas Kneib Introduction Introduction • One of the top ten reasons to become statistician (according to Friedman, Friedman

Thomas Kneib Generalized Additive Models for Location, Scale and Shape

• Direct optimization:

– For GAMLSS, the likelihood is available due to the explicit assumption made forthe distribution of the response.

– Maximization can be achieved by penalized iteratively weighted least squares(IWLS) estimation.

– Estimation and choice of the smoothing parameters is challenging at least forcomplex models.

• Bayesian inference:

– Inference based on Markov chain Monte Carlo (MCMC) simulations is in principlestraightforward but requires careful choice of the proposal densities.

– Promising results obtained based on IWLS proposals.

– Smoothing parameter choice is immediately included.

Beyond Mean Regression 25

Page 27: Beyond Mean Regression - Universität Innsbruck€¦ · Thomas Kneib Introduction Introduction • One of the top ten reasons to become statistician (according to Friedman, Friedman

Thomas Kneib Generalized Additive Models for Location, Scale and Shape

• Boosting:

– Due to the multiple predictors, the usual boosting framework has to be adaptedbut basically still works.

Beyond Mean Regression 26

Page 28: Beyond Mean Regression - Universität Innsbruck€¦ · Thomas Kneib Introduction Introduction • One of the top ten reasons to become statistician (according to Friedman, Friedman

Thomas Kneib Generalized Additive Models for Location, Scale and Shape

• Results for the Munich rental guide obtained with an additive model for location andscale:

20 40 60 80 100 120 140 160

−20

00

200

400

600

mean: area

area in sqm

1920 1940 1960 1980 2000

−50

050

100

150

mean: year of construction

year of construction

Beyond Mean Regression 27

Page 29: Beyond Mean Regression - Universität Innsbruck€¦ · Thomas Kneib Introduction Introduction • One of the top ten reasons to become statistician (according to Friedman, Friedman

Thomas Kneib Generalized Additive Models for Location, Scale and Shape

20 40 60 80 100 120 140 160

−0.

50.

00.

51.

0

standard dev.: area

area in sqm

1920 1940 1960 1980 2000

−0.

2−

0.1

0.0

0.1

0.2

0.3

standard dev.: year of construction

year of construction

Beyond Mean Regression 28

Page 30: Beyond Mean Regression - Universität Innsbruck€¦ · Thomas Kneib Introduction Introduction • One of the top ten reasons to become statistician (according to Friedman, Friedman

Thomas Kneib Quantile Regression

Quantile Regression

• The theoretical τ -quantile qτ for a continuous random variable is characterized by

P (Y ≤ qτ) ≥ τ and P (Y ≥ qτ) ≥ 1− τ.

• Estimation of quantiles based on i.i.d. samples y1, . . . , yn can be accomplished by

qτ = argminq

n∑i=1

wτ(yi, q)|yi − q|

with asymmetric weights

wτ(yi, q) =

1− τ yi < q

0 yi = q

τ yi > q.

Beyond Mean Regression 29

Page 31: Beyond Mean Regression - Universität Innsbruck€¦ · Thomas Kneib Introduction Introduction • One of the top ten reasons to become statistician (according to Friedman, Friedman

Thomas Kneib Quantile Regression

• Plot of the weighted losses wτ(y, q)|y − q| (for q = 0)

Beyond Mean Regression 30

Page 32: Beyond Mean Regression - Universität Innsbruck€¦ · Thomas Kneib Introduction Introduction • One of the top ten reasons to become statistician (according to Friedman, Friedman

Thomas Kneib Quantile Regression

• Quantile regression starts with the regression model

yi = ηiτ + εiτ .

• Instead of assuming E(εiτ) = 0 as in mean regression, we assume

Fεiτ(0) = P (εiτ ≤ 0) = τ

i.e. the τ -quantile of the error is zero.

• This implies that the predictor coincides with the τ -quantile of the conditionaldistribution of the response, i.e.

Fyi(ηiτ) = P (yi ≤ ηiτ) = τ.

Beyond Mean Regression 31

Page 33: Beyond Mean Regression - Universität Innsbruck€¦ · Thomas Kneib Introduction Introduction • One of the top ten reasons to become statistician (according to Friedman, Friedman

Thomas Kneib Quantile Regression

• Quantile regression therefore

– is distribution-free since it does not make any specific assumptions on the type oferrors.

– does not even require i.i.d. errors.

– allows for heteroscedasticity.

Beyond Mean Regression 32

Page 34: Beyond Mean Regression - Universität Innsbruck€¦ · Thomas Kneib Introduction Introduction • One of the top ten reasons to become statistician (according to Friedman, Friedman

Thomas Kneib Quantile Regression

• Note that each parametric regression models also induces a quantile regression model.

• Example: The heteroscedastic normal model

y ∼ N(η1, exp(η2)2)

yieldsqτ = η1 + exp(η2)zτ .

Beyond Mean Regression 33

Page 35: Beyond Mean Regression - Universität Innsbruck€¦ · Thomas Kneib Introduction Introduction • One of the top ten reasons to become statistician (according to Friedman, Friedman

Thomas Kneib Quantile Regression

• Direct optimisation:

– Classical estimation is achieved by minimizing

n∑i=1

wτ(yi, ηiτ)|yi − ηiτ |+p∑

j=1

λjpen(fj).

– Can be solved with linear programming as long as the penalties are also linearfunctionals, e.g. for total variation penalization

pen(fj) =

∫|f ′′

j (x)|dx.

– Does not fit well with the class of quadratic penalties we are considering.

– Smoothing parameter selection is still challenging in particular with multiplesmoothing parameters.

Beyond Mean Regression 34

Page 36: Beyond Mean Regression - Universität Innsbruck€¦ · Thomas Kneib Introduction Introduction • One of the top ten reasons to become statistician (according to Friedman, Friedman

Thomas Kneib Quantile Regression

• Bayesian inference

– Although quantile regression is distribution-free, there is an auxiliary errordistribution that links ML estimation to quantile regression.

– Assume an asymmetric Laplace distribution for the responses, i.e.

yi ∼ ALD(ηiτ , σ2, τ)

with density

exp

(−wτ(yi, ηiτ)

|yi − ηiτ |σ2

).

– Maximizing the resulting likelihood

exp

(−

n∑i=1

wτ(yi, ηiτ)|yi − ηiτ |

σ2

)

is equivalent to minimizing the quantile loss criterion.

Beyond Mean Regression 35

Page 37: Beyond Mean Regression - Universität Innsbruck€¦ · Thomas Kneib Introduction Introduction • One of the top ten reasons to become statistician (according to Friedman, Friedman

Thomas Kneib Quantile Regression

– A computationally attractive way of working with the ALD in a Bayesian frameworkis its scale-mixture representation

– If zi |σ2 ∼ Exp(1/σ2) and

yi | zi, ηiτ , σ2 ∼ N(ηiτ + ξzi, σ2/wi)

with

ξ =1− 2τ

τ(1− τ), wi =

1

δ2zi, δ2 =

2

τ(1− τ).

then yi is marginally ALD(ηiτ , σ2, τ) distributed.

– Allows to construct efficient Gibbs samplers or variational Bayes approximations toexplore the posterior after imputing zi as additional unknowns.

Beyond Mean Regression 36

Page 38: Beyond Mean Regression - Universität Innsbruck€¦ · Thomas Kneib Introduction Introduction • One of the top ten reasons to become statistician (according to Friedman, Friedman

Thomas Kneib Quantile Regression

• Boosting:

– Boosting can be immediately applied in the quantile regression context since it isformulated in terms of a loss function.

– Negative gradients are defined almost everywhere, i.e. no conceptual problems.

Beyond Mean Regression 37

Page 39: Beyond Mean Regression - Universität Innsbruck€¦ · Thomas Kneib Introduction Introduction • One of the top ten reasons to become statistician (according to Friedman, Friedman

Thomas Kneib Quantile Regression

• Results for a geoadditive Bayesian quantile regression model:

τ=0.1

−150 1500

τ=0.2

−150 1500

τ=0.5

−150 1500

τ=0.9

−150 1500

Beyond Mean Regression 38

Page 40: Beyond Mean Regression - Universität Innsbruck€¦ · Thomas Kneib Introduction Introduction • One of the top ten reasons to become statistician (according to Friedman, Friedman

Thomas Kneib Quantile Regression

living area

f( li

ving

are

a )

−50

00

500

1000

20 40 60 80 100 120 140 160

year of construction

f( y

ear

of c

onst

ruct

ion

)

−10

00

5015

0

1920 1940 1960 1980 2000

living area

f( li

ving

are

a )

−50

00

500

1000

20 40 60 80 100 120 140 160

year of construction

f( y

ear

of c

onst

ruct

ion

)

−10

00

5015

0

1920 1940 1960 1980 2000

living area

f( li

ving

are

a )

−50

00

500

1000

20 40 60 80 100 120 140 160

year of construction

f( y

ear

of c

onst

ruct

ion

)

−10

00

5015

0

1920 1940 1960 1980 2000

Beyond Mean Regression 39

Page 41: Beyond Mean Regression - Universität Innsbruck€¦ · Thomas Kneib Introduction Introduction • One of the top ten reasons to become statistician (according to Friedman, Friedman

Thomas Kneib Expectile Regression

Expectile Regression

• What is expectile regression?

n∑i=1

|yi − ηi| → min

median regression

n∑i=1

|yi − ηi|2 → min

mean regression

n∑i=1

wτ(yi, ηiτ)|yi − ηiτ | → min

quantile regression

??

expectile regression

Beyond Mean Regression 40

Page 42: Beyond Mean Regression - Universität Innsbruck€¦ · Thomas Kneib Introduction Introduction • One of the top ten reasons to become statistician (according to Friedman, Friedman

Thomas Kneib Expectile Regression

Expectile Regression

• What is expectile regression?

n∑i=1

|yi − ηi| → min

median regression

n∑i=1

|yi − ηi|2 → min

mean regression

n∑i=1

wτ(yi, ηiτ)|yi − ηiτ | → min

quantile regression

n∑i=1

wτ(yi, ηiτ)|yi − ηiτ |2 → min

expectile regression

Beyond Mean Regression 41

Page 43: Beyond Mean Regression - Universität Innsbruck€¦ · Thomas Kneib Introduction Introduction • One of the top ten reasons to become statistician (according to Friedman, Friedman

Thomas Kneib Expectile Regression

• Theoretical expectiles are obtained by solving

τ =

∫ eτ−∞ |y − eτ |fy(y)dy∫∞−∞ |y − eτ |fy(y)dy

=Gy(eτ)− eτFy(eτ)

2(Gy(eτ)− eτFy(eτ)) + (eτ − µ)

where

– fy(·) and Fy(·) denote the density and cumulative distribution function of y,

– Gy(e) =∫ e

−∞ yfy(y)dy is the partial moment function of y and

– Gy(∞) = µ is the expectation of y.

Beyond Mean Regression 42

Page 44: Beyond Mean Regression - Universität Innsbruck€¦ · Thomas Kneib Introduction Introduction • One of the top ten reasons to become statistician (according to Friedman, Friedman

Thomas Kneib Expectile Regression

• Direct optimization:

– Since the expectile loss is differentiable, estimates for the basis coefficients can beobtained by iterating

γ[t+1]jτ = (Z ′

jW[t]τ Zj + λjKj)

−1Z ′jW

[t]τ y.

– A combination with mixed model methodology allows to estimate the smoothingparameters.

Beyond Mean Regression 43

Page 45: Beyond Mean Regression - Universität Innsbruck€¦ · Thomas Kneib Introduction Introduction • One of the top ten reasons to become statistician (according to Friedman, Friedman

Thomas Kneib Expectile Regression

• Bayesian inference:

– Similarly as for quantile regression, an asymmetric normal distribution can bedefined as auxiliary distribution for the responses.

– No scale mixture representation known so far.

– Bayesian formulation probably less important since inference is directly tractable.

• Boosting:

– Boosting can be immediately applied in the expectile regression context.

Beyond Mean Regression 44

Page 46: Beyond Mean Regression - Universität Innsbruck€¦ · Thomas Kneib Introduction Introduction • One of the top ten reasons to become statistician (according to Friedman, Friedman

Thomas Kneib Comparison

Comparison

• Advantages of GAMLSS:

– One joint model for the distribution of the responses.

– Interpretability of the estimated effects in terms of parameters of the responsedistribution.

– Quantiles (or expectiles) derived from GAMLSS will always be coherent, i.e.ordering will be preserved.

– Readily available in both frequentist and Bayesian formulation.

• Disadvantages of GAMLSS:

– Potential for misspecification of the observation model.

– Model checking difficult in complex settings.

– If quantiles are of ultimate interest, GAMLSS do not provide direct estimates forthese.

Beyond Mean Regression 45

Page 47: Beyond Mean Regression - Universität Innsbruck€¦ · Thomas Kneib Introduction Introduction • One of the top ten reasons to become statistician (according to Friedman, Friedman

Thomas Kneib Comparison

• Advantages of quantile regression:

– Completely distribution-free approach.

– Easy interpretation in terms of conditional quantiles.

– Bayesian formulation enables very flexible, fully data-driven semiparametricspecifications of the predictor.

• Disadvantages of quantile regression:

– Bayesian formulation requires an auxiliary error distribution (that will usually be amisspecification).

– Estimated cumulative distribution function is a step function even for continuousdata.

– Additional efforts required to avoid crossing of quantile curves.

Beyond Mean Regression 46

Page 48: Beyond Mean Regression - Universität Innsbruck€¦ · Thomas Kneib Introduction Introduction • One of the top ten reasons to become statistician (according to Friedman, Friedman

Thomas Kneib Comparison

• Advantages of expectile regression:

– Computationally simple (iteratively weighted least squares).

– Still allows to characterize the complete conditional distribution of the response.

– Quantiles (or conditional distributions) can be computed based on expectiles.

– Expectiles seem to be more efficient in close-to-Gaussian situations then quantiles.

– Expectile crossing seems to be less of an issue as compared to quantile crossing.

– The estimated expectile curve is smooth.

• Disadvantages of expectile regression:

– Immediate interpretation of expectiles is difficult.

Beyond Mean Regression 47

Page 49: Beyond Mean Regression - Universität Innsbruck€¦ · Thomas Kneib Introduction Introduction • One of the top ten reasons to become statistician (according to Friedman, Friedman

Thomas Kneib Summary

Summary

• There is more than mean regression!

• Semiparametric extensions become available also for models beyond mean regression.

• You can do this at home:

– Quantile regression: R-package quantreg.

– Bayesian quantile regression: BayesX (MCMC) and forthcoming R-package onvariational Bayes approximations (VA).

– GAMLSS: R-packages gamlss and gamboostLSS.

– Expectile regression: R-package expectreg.

• Interesting addition to the models considered: Modal regression (yet to be explored).

Beyond Mean Regression 48

Page 50: Beyond Mean Regression - Universität Innsbruck€¦ · Thomas Kneib Introduction Introduction • One of the top ten reasons to become statistician (according to Friedman, Friedman

Thomas Kneib Summary

• Acknowledgements:

– This talk is mostly based on joint work with Nora Fenske, Benjamin Hofner, TorstenHothorn, Goran Kauermann, Stefan Lang, Andreas Mayr, Matthias Schmid, LindaSchulze Waltrup, Fabian Sobotka, Elisabeth Waldmann and Yu Yue.

– Financial support has been provided by the German Research Foundation (DFG).

• A place called home:

http://www.statoek.wiso.uni-goettingen.de

Beyond Mean Regression 49