IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Semiparametric and NonparametricAdditive Regression Models
Matúš Maciak
Department of Probability and Mathematical Statistics
March 30, 2007Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Contents
1 IntroductionMotivationCurse of DimensionalityAdditive Decomposition
2 Additive RegressionSpline EstimatesKernel Estimates
3 Generalized Additive Regression
4 Rate of Convergence
5 Model Selection Criteria
6 Adaptive MethodsWell-known algorithmsReal data - Example
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
MotivationCurse of DimensionalityAdditive Decomposition
The main objectives...
1 The “Curse of Dimensionality” problem - the main reason whyone tries to apply additive semiparametric and nonparametricregression approaches.
2 The most frequently used methods to obtain additive estimates.3 Generalized additive regression models - in a special case of
binary data samples.4 Expectaction - to achieve the same rates of convergence for
additive estimates as in a case of univariate regression problem.5 Model selection criteria - the optimal choice of the final model
from a set of all proposed models.6 Adaptive strategies (CIM) - RPR, MARS and PPR, etc.
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
MotivationCurse of DimensionalityAdditive Decomposition
The main objectives...
1 The “Curse of Dimensionality” problem - the main reason whyone tries to apply additive semiparametric and nonparametricregression approaches.
2 The most frequently used methods to obtain additive estimates.3 Generalized additive regression models - in a special case of
binary data samples.4 Expectaction - to achieve the same rates of convergence for
additive estimates as in a case of univariate regression problem.5 Model selection criteria - the optimal choice of the final model
from a set of all proposed models.6 Adaptive strategies (CIM) - RPR, MARS and PPR, etc.
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
MotivationCurse of DimensionalityAdditive Decomposition
The main objectives...
1 The “Curse of Dimensionality” problem - the main reason whyone tries to apply additive semiparametric and nonparametricregression approaches.
2 The most frequently used methods to obtain additive estimates.3 Generalized additive regression models - in a special case of
binary data samples.4 Expectaction - to achieve the same rates of convergence for
additive estimates as in a case of univariate regression problem.5 Model selection criteria - the optimal choice of the final model
from a set of all proposed models.6 Adaptive strategies (CIM) - RPR, MARS and PPR, etc.
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
MotivationCurse of DimensionalityAdditive Decomposition
The main objectives...
1 The “Curse of Dimensionality” problem - the main reason whyone tries to apply additive semiparametric and nonparametricregression approaches.
2 The most frequently used methods to obtain additive estimates.3 Generalized additive regression models - in a special case of
binary data samples.4 Expectaction - to achieve the same rates of convergence for
additive estimates as in a case of univariate regression problem.5 Model selection criteria - the optimal choice of the final model
from a set of all proposed models.6 Adaptive strategies (CIM) - RPR, MARS and PPR, etc.
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
MotivationCurse of DimensionalityAdditive Decomposition
The main objectives...
1 The “Curse of Dimensionality” problem - the main reason whyone tries to apply additive semiparametric and nonparametricregression approaches.
2 The most frequently used methods to obtain additive estimates.3 Generalized additive regression models - in a special case of
binary data samples.4 Expectaction - to achieve the same rates of convergence for
additive estimates as in a case of univariate regression problem.5 Model selection criteria - the optimal choice of the final model
from a set of all proposed models.6 Adaptive strategies (CIM) - RPR, MARS and PPR, etc.
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
MotivationCurse of DimensionalityAdditive Decomposition
The main objectives...
1 The “Curse of Dimensionality” problem - the main reason whyone tries to apply additive semiparametric and nonparametricregression approaches.
2 The most frequently used methods to obtain additive estimates.3 Generalized additive regression models - in a special case of
binary data samples.4 Expectaction - to achieve the same rates of convergence for
additive estimates as in a case of univariate regression problem.5 Model selection criteria - the optimal choice of the final model
from a set of all proposed models.6 Adaptive strategies (CIM) - RPR, MARS and PPR, etc.
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
MotivationCurse of DimensionalityAdditive Decomposition
Contents
1 IntroductionMotivationCurse of DimensionalityAdditive Decomposition
2 Additive RegressionSpline EstimatesKernel Estimates
3 Generalized Additive Regression
4 Rate of Convergence
5 Model Selection Criteria
6 Adaptive MethodsWell-known algorithmsReal data - Example
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
MotivationCurse of DimensionalityAdditive Decomposition
Multivariate Regression
Let X ∈ χ ⊆ RJ be a J−dimensional random variable andconsider a random variable Y with a mean µ ∈ R and the finitesecond moment EY 2 < ∞.
Let f : χ ∈ RJ → R be a J−dimensional function such thatE[Y |X = x] = f(x) - regression function of Y on X.
Regression function f(x) is supposed to be smooth up to thespecific order.
There are no other assumptions taken on the functional form ofthe function f(·) but smoothness.
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
MotivationCurse of DimensionalityAdditive Decomposition
Multivariate Kernel regression
Multidimensional Smoothing ⇒ Multidimensional Regression
f(x) = E[Y |X = x] =
∫yg(y |x)dy =
∫yp1(y , x)dy
p2(x)
Estimates of functions p1, p2 W Kernel Density Estimation
fh(x) =
∑Ni=1 κh(Xi − x)Yi∑N
i=1 κh(Xi − x),
where κh is the multivariate, multiplicative kernel and h = (h1, . . . , hJ)is a vector of appropriate bandwidths.V Problems:“Curse of Dimensionality” and low asymptotic convergence...
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
MotivationCurse of DimensionalityAdditive Decomposition
Multivariate Kernel regression
Multidimensional Smoothing ⇒ Multidimensional Regression
f(x) = E[Y |X = x] =
∫yg(y |x)dy =
∫yp1(y , x)dy
p2(x)
Estimates of functions p1, p2 W Kernel Density Estimation
fh(x) =
∑Ni=1 κh(Xi − x)Yi∑N
i=1 κh(Xi − x),
where κh is the multivariate, multiplicative kernel and h = (h1, . . . , hJ)is a vector of appropriate bandwidths.V Problems:“Curse of Dimensionality” and low asymptotic convergence...
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
MotivationCurse of DimensionalityAdditive Decomposition
Additive approaches...
Let (X, Y ) ∈ RJ+1 be a pair of random variables such thatX = (X1, . . . , XJ) and Y is a real valued variable with a meanEY = µ and the finite second moment 0 < EY 2 ≤ K < ∞.
Consider an unknown regression function f : RJ → R of Yon X ∈ RJ so that f (x) = E[Y |X = x] (f : [0, 1]J → R).
We impose one more condition:
f (x1, . . . , xJ) = µ +J∑
j=1
fj(xj)
The functional components fj areuniquely determined and Efj(Xj) = 0.
Smoothness assumption remains...(smoothness of functional components)
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
MotivationCurse of DimensionalityAdditive Decomposition
Additive Estimates...
Let (X1, Y1), (X2, Y2), . . . , (XN , YN) denote an independentrandom sample, where each pair (Xi , Yi) has the samedistribution as (X, Y ).Estimates of the true underlying regression function are given bydifferent approaches (splines techniques, B-splines and kernelestimates)Semi-parametric (Nonparametric) estimate is based on therandom sample of size N - it can be written in the additive form:
fN(x1, . . . , xJ) = Y N +J∑
j=1
fNj(xj)
Regarding to the assumption on the functional components fj onehas to consider that
∑i=1,...,N fNj(Xij) = 0 for all j ∈ {1, . . . , J}.
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
MotivationCurse of DimensionalityAdditive Decomposition
Splines vs. Kernels
Spline estimates:
1 Semi-parametric approaches2 High-dimensional data3 Extra-large sample sizes4 No asymptotic distribution5 No uniform convergence over
the whole interval6 No measure of uniform
accuracy (except L2 norm)7 So called Sledge-hammer
technique
Kernel estimates:
1 Nonparametric techniques2 Too costly for large dimension3 Too costly for large sample
sizes N ∈ N4 Asymptotic (normal)
distribution (conf. intervals)5 Uniform convergence over the
whole interval6 So called Sharp-knife
technique
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
MotivationCurse of DimensionalityAdditive Decomposition
Splines vs. Kernels
Spline estimates:
1 Semi-parametric approaches2 High-dimensional data3 Extra-large sample sizes4 No asymptotic distribution5 No uniform convergence over
the whole interval6 No measure of uniform
accuracy (except L2 norm)7 So called Sledge-hammer
technique
Kernel estimates:
1 Nonparametric techniques2 Too costly for large dimension3 Too costly for large sample
sizes N ∈ N4 Asymptotic (normal)
distribution (conf. intervals)5 Uniform convergence over the
whole interval6 So called Sharp-knife
technique
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
MotivationCurse of DimensionalityAdditive Decomposition
Contents
1 IntroductionMotivationCurse of DimensionalityAdditive Decomposition
2 Additive RegressionSpline EstimatesKernel Estimates
3 Generalized Additive Regression
4 Rate of Convergence
5 Model Selection Criteria
6 Adaptive MethodsWell-known algorithmsReal data - Example
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
MotivationCurse of DimensionalityAdditive Decomposition
What is “Curse of Dimensionality” problem?
1 The size N ∈ N of data sample required to fit J-dimensionalregression surface increases exponentially with the increasingnumber of dimensions.
2 Some limitations given by an estimation ability of the most ofmultivariate regression approaches (splines, kernels).
3 The asymptotic convergence decreases with the increasingnumber of dimensions J (according to the expression r = p
2p+J )
4 Too costly algorithms dealing with high-dimensional data withoutdimensionality reduction principle (straightforward methods)
5 Special case - the components of X are not independent...
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
MotivationCurse of DimensionalityAdditive Decomposition
Curse of Dimensionality - examples:
Consider random variables Y = (Y1, . . . , YJ) and the randomsample{Xi = (Xi1, . . . , XiJ); i = 1, . . . N} such that
Xi ∼ R([0, 1]J
), Y ∼ R
([0, 1]J
).
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
MotivationCurse of DimensionalityAdditive Decomposition
Maximum Distance vs. Euclidean Distance
Maximum distance:‖ x ‖max = maxj=1,...,J |xj |
Euclidean distance:‖ x ‖2
euc =∑
j=1,...,J x2j
Maximum distance
J = 1 J = 2 J = 3 J = 5 J = 10 d = 20N = 100 0.003838 0.054951 0.094651 0.232593 0.366015 0.571504N = 1000 0.000506 0.015051 0.053464 0.129761 0.273968 0.440982N = 10000 0.000044 0.004691 0.021613 0.044339 0.213186 0.402223N = 100000 0.000006 0.001178 0.009108 0.030709 0.159703 0.353620
Euclidean distance
J = 1 J = 2 J = 3 J = 5 J = 10 d = 20N = 100 0.003838 0.060434 0.118966 0.328987 0.660090 1.264582N = 1000 0.000506 0.017274 0.063800 0.191530 0.498007 1.000749N = 10000 0.000041 0.005546 0.027003 0.060081 0.376753 0.909363N = 100000 0.000005 0.001376 0.011672 0.052891 0.289131 0.795231
The empirical average minimum distance between two uniformlydistributed random variables in a hypercube [0, 1]J .
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
MotivationCurse of DimensionalityAdditive Decomposition
The Lower Bounds – bandwidth selection...
Lemma (Packing density in a hypercube – maximum distance)
Let Y ∼ R([0, 1]J
)and let Xi , i = 1, . . . , N is a random sample where
each Xi ∼ R([0, 1]J
). Then Bmax(N, J) is the lower bound for the
average minimum distance for the maximum distance, where
Bmax(N, J) =12· 1
N1/J· J
J + 1(1)
Lemma (Packing density in a hypercube – Euclidean distance)
Let Y ∼ R([0, 1]J
)and let Xi , i = 1, . . . , N is a random sample where
each Xi ∼ R([0, 1]J
). The Beuc(N, J) is the lower bound for the
average minimum distance for the Euclidean distance, where
Beuc(N, J) =12·√
JN1/J
· JJ + 1
(2)
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
MotivationCurse of DimensionalityAdditive Decomposition
Contents
1 IntroductionMotivationCurse of DimensionalityAdditive Decomposition
2 Additive RegressionSpline EstimatesKernel Estimates
3 Generalized Additive Regression
4 Rate of Convergence
5 Model Selection Criteria
6 Adaptive MethodsWell-known algorithmsReal data - Example
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
MotivationCurse of DimensionalityAdditive Decomposition
The additive form of regression function
Is the true underlying regression function genuinely additive?
1 V YES: → the straightforward estimation of the functionalcomponents (just an occasional case)
2 V No: → then one has to find some additive approximation,which will be consequentially estimated
How to define a measure of accuracy between the underlyingregression function and its approximation???
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
MotivationCurse of DimensionalityAdditive Decomposition
The additive form of regression function
Is the true underlying regression function genuinely additive?
1 V YES: → the straightforward estimation of the functionalcomponents (just an occasional case)
2 V No: → then one has to find some additive approximation,which will be consequentially estimated
How to define a measure of accuracy between the underlyingregression function and its approximation???
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
MotivationCurse of DimensionalityAdditive Decomposition
The additive form of regression function
Is the true underlying regression function genuinely additive?
1 V YES: → the straightforward estimation of the functionalcomponents (just an occasional case)
2 V No: → then one has to find some additive approximation,which will be consequentially estimated
How to define a measure of accuracy between the underlyingregression function and its approximation???
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
MotivationCurse of DimensionalityAdditive Decomposition
The additive form of regression function
Is the true underlying regression function genuinely additive?
1 V YES: → the straightforward estimation of the functionalcomponents (just an occasional case)
2 V No: → then one has to find some additive approximation,which will be consequentially estimated
How to define a measure of accuracy between the underlyingregression function and its approximation???
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
MotivationCurse of DimensionalityAdditive Decomposition
Additive Decomposition - approximation
Consider a regression function f which is not genuinely additive.In such a case the regression function f can be successfullydecomposed into main effects (additive decomposition).
Condition 1
Let the distribution of X ∈ [0, 1]J is absolutely continuous and let itsdensity g is bounded away from zero and infinity(∃ b > 0 ∃B > b ∀x ∈ C = [0, 1]Jb ≤ g ≤ B.
The additive approximation to f can be obtained as a sumof J univariate functions f ∗j (xj) where
f ∗j (xj) = E[f (x)|Xj = xj
]− E
[f (x)
], x = (x1, . . . , xJ) ∈ [0, 1]J .
If there are interactions between some variables required onecan obtain them in a similar way...
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
MotivationCurse of DimensionalityAdditive Decomposition
Additive Decomposition – definiteness...
Lemma 1
Let the random variable∑
j hj(Xj) has a finite second moment where
hj are functions on [0, 1]. Set δ =√
(1− b/B) and let SD(·) denotesthe standard deviation. Then each hj(Xj) has a finite second momentand the next statement holds:
SD(∑
j
hj(Xj)) ≥ ((1−δ)/2)((J−1)/2) · (SD(h1(X1))+ · · ·+SD(hJ(XJ))).
Under the Condition 1 it follows from the lemma that the functionalcomponents are uniquely determined up to set of measure zero.
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Spline EstimatesKernel Estimates
Resumption...
1 Regression function f (x) = E[Y |X = x]
2 Estimates based on a random sample {(Xi , Yi), i = 1, . . . , N}3 Random variable Y has a mean µ ∈ R and a finete second
moment EY 2 < ∞4 With any loss of generality - we assume that function f has as
additive form - otherwise we use additive decomposition5 Functional components ff with zero mean Efj(xj) = 0
(to avoid constant functional components)
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Spline EstimatesKernel Estimates
Contents
1 IntroductionMotivationCurse of DimensionalityAdditive Decomposition
2 Additive RegressionSpline EstimatesKernel Estimates
3 Generalized Additive Regression
4 Rate of Convergence
5 Model Selection Criteria
6 Adaptive MethodsWell-known algorithmsReal data - Example
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Spline EstimatesKernel Estimates
Regression splines
the first method - polynomial estimates (over-fitting, etc.)
polynomial regression with penalties (not used anymore)
To avoid some problems related to polynomial regression ⇒implementation of spline approaches (piecewise polynomial)
Definition 1 - Spline function
Spline is a piecewise polynomial function of nth degree, where singlepolynomial pieces joint together in the knot points, obeying continuityconditions for function itself and its n − 1 derivatives.
Problem: How to chose the number and the positions of knots?
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Spline EstimatesKernel Estimates
Regression splines
the first method - polynomial estimates (over-fitting, etc.)
polynomial regression with penalties (not used anymore)
To avoid some problems related to polynomial regression ⇒implementation of spline approaches (piecewise polynomial)
Definition 1 - Spline function
Spline is a piecewise polynomial function of nth degree, where singlepolynomial pieces joint together in the knot points, obeying continuityconditions for function itself and its n − 1 derivatives.
Problem: How to chose the number and the positions of knots?
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Spline EstimatesKernel Estimates
Regression splines - power basis
Spline estimation approaches are based on a set of basis functions:
1 Spline Power basis takes a following form:{1, x , x2, . . . xn, (x − ξ1)
n+, . . . (x − ξK )n
+} (n - spline order)2 The estimate of each functional component fj is defined as:
fj(xj) =∑n
l=0 β0jx lj +
∑Kk=1 βkn(xj − ξk )n
+
3 The estimate of the underlying additive regression function isdefined as a minimizing problem
∑Ni=1(Yi − Y N −
∑Jj=1 fj(xij))
2
subject to basis coefficients β01, . . . , β0n, β1n, . . . , βKn.
0 20 40 60 80 100
0.0
0.4
0.8
0 20 40 60 80 100
0.0
0.4
0.8
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Spline EstimatesKernel Estimates
Regression splines with penalties
In a case we redefine the former minimization problem such that weprovide minimization with respect to knots positions ⇒ there is aproblem of over-smoothing (interpolation).
Regression Splines ⇒ Regression Splines with penalties
to ensure a better flexibility of the final estimate
to get an ability to control the amount of smoothness
The estimate of the true underlying regression function f = (fj , . . . , fJ)is given by the minimization problem:
MinimizeN∑
i=1
(Yi − Y N −J∑
j=1
fj(xij))2 + λ
∫ 1
0(f ′′(x))2dx ,
where λ is so called smoothing parameter.
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Spline EstimatesKernel Estimates
Regression splines with penalties
In a case we redefine the former minimization problem such that weprovide minimization with respect to knots positions ⇒ there is aproblem of over-smoothing (interpolation).
Regression Splines ⇒ Regression Splines with penalties
to ensure a better flexibility of the final estimate
to get an ability to control the amount of smoothness
The estimate of the true underlying regression function f = (fj , . . . , fJ)is given by the minimization problem:
MinimizeN∑
i=1
(Yi − Y N −J∑
j=1
fj(xij))2 + λ
∫ 1
0(f ′′(x))2dx ,
where λ is so called smoothing parameter.
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Spline EstimatesKernel Estimates
B-splines basis
Consider B-splines basis of the nth order. Then it holds:1 Each B-spline function consist of n + 1 polynomial pieces2 Single pieces joint in n inner knots3 At the knot points - continuity condition up to the order n − 14 Each B-spline basis is positive over a domain spanned by n + 2
knots - everywhere else it is zero by definition5 B-spline function is overlapped by 2n another basis functions6 At any points x ∈ [0, 1] there are n + 1 nonzero basis functions
−2 0 2 4 6 8 10 12
0.00
0.10
0.20
x values
y va
lues
−5 0 5 10 15
0.00
0.04
0.08
x values
y va
lues
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Spline EstimatesKernel Estimates
B-splines - estimation
The estimate of each functional component is written as a linearcombination of spline basis functions (piecewise polynomials ofthe degree n ∈ N).
fj∆(xj) =K+n+1∑
k=1
ϑjk ·Bkn(xj)
The estimate of the whole unknown regression function f isdefined as a following minimization problem
minϑjk∈R
N∑i=1
[Yi − Y N −
∑Jj=1 ϑjk · Bkn(xij)
]2
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Spline EstimatesKernel Estimates
B-splines with penalties (P-splines)
In regard to ensure better control over smoothness and a betterflexibility of the final estimate, there were proposed B-splines withpenalties:
The estimate is given as a minimization problem
Minimize :N∑
i=1
(fN(Xi)− Yi)2 +
J∑j=1
λj ·∫ ξK+1
ξ0
(f ′′j∆(xj))2dxj
subject to basis coefficients ϑjk and parameter λ with the sameB-splines basis as in a case of simple B-splines estimates.
The optimal choice of the smoothing parameter λ⇒ Model Selection Criteria
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Spline EstimatesKernel Estimates
B-splines with penalties (P-splines)
In regard to ensure better control over smoothness and a betterflexibility of the final estimate, there were proposed B-splines withpenalties:
The estimate is given as a minimization problem
Minimize :N∑
i=1
(fN(Xi)− Yi)2 +
J∑j=1
λj ·∫ ξK+1
ξ0
(f ′′j∆(xj))2dxj
subject to basis coefficients ϑjk and parameter λ with the sameB-splines basis as in a case of simple B-splines estimates.
The optimal choice of the smoothing parameter λ⇒ Model Selection Criteria
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Spline EstimatesKernel Estimates
Power basis vs. B-spline basis
Power basis:
1 Relation between a knot andcorresponding basis function
2 Greater correlation betweenbasis functions
B-spline basis:
1 Numerically much morestable set of basis functions
2 Smaller correlation betweenbasis functions
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Spline EstimatesKernel Estimates
Power basis vs. B-spline basis
Power basis:
1 Relation between a knot andcorresponding basis function
2 Greater correlation betweenbasis functions
B-spline basis:
1 Numerically much morestable set of basis functions
2 Smaller correlation betweenbasis functions
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Spline EstimatesKernel Estimates
Contents
1 IntroductionMotivationCurse of DimensionalityAdditive Decomposition
2 Additive RegressionSpline EstimatesKernel Estimates
3 Generalized Additive Regression
4 Rate of Convergence
5 Model Selection Criteria
6 Adaptive MethodsWell-known algorithmsReal data - Example
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Spline EstimatesKernel Estimates
Additive Kernel Estimates - progress
Multidimensional estimate of the unknown regression function f⇒ consequentially we estimate single components f1, . . . , fJ .
1 Motivated by additive linear regression2 First iterative procedures (backfitting algorithm)3 Some other iterative procedures (RPR, PPR, MARS)4 Proposed so called Direct Integration Method - 1994
↪→ the statistical properties of such an estimate are straightforward toderive (bias, variance, asymptotical properties, confidence int., etc.)
↪→ the asymptotical normality of DIM estimates
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Spline EstimatesKernel Estimates
Additive Kernel Estimates - progress
Multidimensional estimate of the unknown regression function f⇒ consequentially we estimate single components f1, . . . , fJ .
1 Motivated by additive linear regression2 First iterative procedures (backfitting algorithm)3 Some other iterative procedures (RPR, PPR, MARS)4 Proposed so called Direct Integration Method - 1994
↪→ the statistical properties of such an estimate are straightforward toderive (bias, variance, asymptotical properties, confidence int., etc.)
↪→ the asymptotical normality of DIM estimates
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Spline EstimatesKernel Estimates
Direct Integration Method
Consider a multivariate unknown regression function f (x) which is inadditive form. Let X = (X1, X) ∈ R× RJ−1 and define the functionalϕ1(x1) as follows:
ϕ1(x1) =
∫ 1
0f (x1, x)p2(x)d x
Under the assumption about the additive form of functionf = (f1, . . . , fJ), it holds that ϕ1 = f1 up to the additive constant µ.
V Multivariate Nadaraya-Watson kernel estimateV Kernel estimates of functions f () and p2
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Spline EstimatesKernel Estimates
Direct Integration Method
the estimate of f1(x1) is given as a sample version of thefunctional ϕ1(x1):
f1(x1) =1N
N∑i=1
f (x1, Xi)
the estimate f1(x1) can be written in the form:
f1(x1) =N∑
i=1
wi(x1)Yi ,
where wi(x1) = n−1 ∑Ni=1 wi(x1, Xi). Weights wi(x1, Xi) are given
by the equation f (x1, X) =∑N
i=1 wi(x1, Xi)Yi
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Spline EstimatesKernel Estimates
Asymptotical normality of Kernel Additive Estimate
the functional components f2, . . . , fJ can be by obtained by thesimilar process, considering the functional ϕk (xk ) and a partition(Xk , X) ∈ R× RJ−1, and X = (X1, . . . , Xk−1, Xk+1, . . . XJ).
Theorem (Asymptotical normality)
Under some assumptions on N ∈ N and smoothness bandwidth hand g for Kernel estimates, it holds
N25[ϕj(xj)− ϕj(xj)
]→ N(bj(xj), vj(xj))
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Generalization into the GAM
In a case of binary data (survival time data) it is more convenient toimplement Generalized Additive Models - GAM.
1 Full model specification - the conditional distribution function ofY given X belongs to an exponential family - known link function
G[f (x)
]= µ +
J∑j=1
fj(xj)
2 Partial model specification - no restrictions on exponentialfamily - variance stays function unrestricted
If one takes a link function G to be an identity ⇒ Classical AdditiveRegression model (another choices: logit , probit , logarithm).
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
GAM - estimation
The estimation procedure is similar to that in Additive Kernelestimates.Let X = (X1, X) such that X = (X2, . . . , XJ). Let’s define ϕ1(x1):
ϕ1(x1) =
∫G[f (x1, x)] · p2(x)d x
Multidimensional Nadaraya - Watson kernel estimator⇒ nonparametric multivariate kernel estimates of p2 and f .
the estimate of f1 is unified with the estimate of ϕ1
the estimate of ϕ1 is given by the equation:
ϕ1(x1) =1N
N∑i=1
G[f (x1, Xi)], Xi = (Xi2, . . . , XiJ).
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
ADVANTAGES or DISADVANTAGES?
What is the main advantage of an additive approach?
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
The Optimal Global Rate of Convergence
The sequence {bN} is the optimal rate of convergence if:
limc→0
lim infN→∞
supf∈κ
P[‖ TN − f ‖q> c · bN
]= 1
limc→∞
lim supN→∞
supf∈κ
P[‖ TN − f ‖q> c · bN
]= 0
The optimal global rate of convergence given by Stone:
Theorem (Rate of Convergence for Nonparametric Estimates)
Let β ∈ (0, 1] and set p = k + β. Let 0 < q ≤ ∞ and set r = p−m2p+J .
Then the optimal global rate of convergence is
{N−r}, q ∈ (0,∞) {(N−1 · ln N)r}, q = ∞.
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Additive Reduction Principle
The effectiveness of the additive reduction principle onsimpleness and interpretability of the model.
“Curse of dimensionality” prevention.
The improvement of the optimal global rate of convergence.(r = p−m
2p+J −→ r = p−m2p+1 )
Predic
tor XPredictor Z
Response Y
0 5000 15000 25000
−20
−10
010
20
income
s(in
com
e,3.
12)
6 8 10 12 14 16
−20
−10
010
20education
s(ed
ucat
ion,
3.18
)
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Additive Expansion in L2 Norm
Consider and additive estimate fN of the regression function f .Set γ = 1/(2p + 1) and r = (p −m)/(2p + 1).
Theorem (Rate of Convergence for Additive Estimates)
Suppose that all necessary conditions hold. Let NN ∼ Nγ . Then:
‖ f (m)Nj − (f ∗j )(m) ‖2
j = Opr (N−2r ) ‖ fNj − (f ∗j ) ‖2j = Opr (N−2r )
‖ fN − (f ∗) ‖2= Opr (N−2r ) (Y N − µ)2 = Opr (N−2r )
The only reasonable derivatives are partial derivatives for the
same variable ( ∂2bfN
∂xj1∂xj2
= 0 for j1 6= j2).
Theorem holds for the redefinition of the mth derivative of theadditive function (linear combination of partial derivatives).
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Additive Expansion in L2 Norm
Consider and additive estimate fN of the regression function f .Set γ = 1/(2p + 1) and r = (p −m)/(2p + 1).
Theorem (Rate of Convergence for Additive Estimates)
Suppose that all necessary conditions hold. Let NN ∼ Nγ . Then:
‖ f (m)Nj − (f ∗j )(m) ‖2
j = Opr (N−2r ) ‖ fNj − (f ∗j ) ‖2j = Opr (N−2r )
‖ fN − (f ∗) ‖2= Opr (N−2r ) (Y N − µ)2 = Opr (N−2r )
The only reasonable derivatives are partial derivatives for the
same variable ( ∂2bfN
∂xj1∂xj2
= 0 for j1 6= j2).
Theorem holds for the redefinition of the mth derivative of theadditive function (linear combination of partial derivatives).
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Additive Expansion in L2 Norm
Consider and additive estimate fN of the regression function f .Set γ = 1/(2p + 1) and r = (p −m)/(2p + 1).
Theorem (Rate of Convergence for Additive Estimates)
Suppose that all necessary conditions hold. Let NN ∼ Nγ . Then:
‖ f (m)Nj − (f ∗j )(m) ‖2
j = Opr (N−2r ) ‖ fNj − (f ∗j ) ‖2j = Opr (N−2r )
‖ fN − (f ∗) ‖2= Opr (N−2r ) (Y N − µ)2 = Opr (N−2r )
The only reasonable derivatives are partial derivatives for the
same variable ( ∂2bfN
∂xj1∂xj2
= 0 for j1 6= j2).
Theorem holds for the redefinition of the mth derivative of theadditive function (linear combination of partial derivatives).
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Additive Expansion in L∞ Norm
The effect of the additive decomposition on the rate ofconvergence in supremum norm (r = p
2p+J −→ r = p2p+1 ).
To decompose not even the unknown regression function but thewhole regression problem (⇒ J univariate regression problems).
Theorem
Let all necessary conditions hold and let Nγ ∼ NN . Suppose thatEY = µ = 0 and let r = p
2p+1 . Then:
‖fN − f ∗‖∞ = supx∈[0,1]J
|fN(x)− f ∗(x)| = Opr (N−r · logr N) (3)
‖fNj − f ∗j ‖∞,j = supxj∈[0,1]
|fNj(xj)− f ∗j (xj)| = Opr (N−r · logr N) (4)
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Additive Expansion in L∞ Norm
The effect of the additive decomposition on the rate ofconvergence in supremum norm (r = p
2p+J −→ r = p2p+1 ).
To decompose not even the unknown regression function but thewhole regression problem (⇒ J univariate regression problems).
Theorem
Let all necessary conditions hold and let Nγ ∼ NN . Suppose thatEY = µ = 0 and let r = p
2p+1 . Then:
‖fN − f ∗‖∞ = supx∈[0,1]J
|fN(x)− f ∗(x)| = Opr (N−r · logr N) (3)
‖fNj − f ∗j ‖∞,j = supxj∈[0,1]
|fNj(xj)− f ∗j (xj)| = Opr (N−r · logr N) (4)
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
The Effectiveness of the Additive Expansion
0 2000 4000 6000 8000 10000
0.00
20.
006
0.01
00.
014
N observations
Rat
e of
Con
verg
ence
●
●
●
●
●
●●
●●
●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
●
●
●
●●
●●
●●
●●
●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ●Supremum norm
Euclidean norm
J=2
J=1
J=2J=1
Figure: The optimal global rate of convergence for the additive models in the case of twodimensional regression surface for the Supremum norm and the Euclidean norm.
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Optimal model selection
1 Spline estimates: - in a case of implementation of smoothingparameter λ one gets a set of “good” admissible models⇒ there come up a requirement to take only one
2 Penalized splines: - a set of all admissible models even moreincreases once we consider a minimization problem oversmoothing parameter λ and knots positions ∆ too.
3 Kernel regression: - a problem of a right selection of thesmoothing parameter h - a measure of localness(or a multiple bandwidth parameter h)
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Optimal model selection
1 Spline estimates: - in a case of implementation of smoothingparameter λ one gets a set of “good” admissible models⇒ there come up a requirement to take only one
2 Penalized splines: - a set of all admissible models even moreincreases once we consider a minimization problem oversmoothing parameter λ and knots positions ∆ too.
3 Kernel regression: - a problem of a right selection of thesmoothing parameter h - a measure of localness(or a multiple bandwidth parameter h)
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Optimal model selection
1 Spline estimates: - in a case of implementation of smoothingparameter λ one gets a set of “good” admissible models⇒ there come up a requirement to take only one
2 Penalized splines: - a set of all admissible models even moreincreases once we consider a minimization problem oversmoothing parameter λ and knots positions ∆ too.
3 Kernel regression: - a problem of a right selection of thesmoothing parameter h - a measure of localness(or a multiple bandwidth parameter h)
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Optimal model selection
1 Spline estimates: - in a case of implementation of smoothingparameter λ one gets a set of “good” admissible models⇒ there come up a requirement to take only one
2 Penalized splines: - a set of all admissible models even moreincreases once we consider a minimization problem oversmoothing parameter λ and knots positions ∆ too.
3 Kernel regression: - a problem of a right selection of thesmoothing parameter h - a measure of localness(or a multiple bandwidth parameter h)
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Model Selection Criteria
●
●
●
●●●
●
●●
●
●
●●
●
●●
●
●●
●
●
●●
●
●●
●●●●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●
●●
●
●
●●
●
●●
●
●
●
●●
●
●●●●●●●●
●●
●●●●●●
●
●
●
●●●
●
●
●
●
●
●●●
●
0.0 0.2 0.4 0.6 0.8 1.0
1.0
1.5
2.0
2.5
3.0
x values
y va
lues
Smoothing parameters:
lambda = 0.000 008
Cross-Validation
Generalized Cross-Validation
Akaike information Criterion
Bayesian InformationCriterion
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Model Selection Criteria
●
●
●
●●●
●
●●
●
●
●●
●
●●
●
●●
●
●
●●
●
●●
●●●●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●
●●
●
●
●●
●
●●
●
●
●
●●
●
●●●●●●●●
●●
●●●●●●
●
●
●
●●●
●
●
●
●
●
●●●
●
0.0 0.2 0.4 0.6 0.8 1.0
1.0
1.5
2.0
2.5
3.0
x values
y va
lues
Smoothing parameters:
lambda = 0.001 357
Cross-Validation
Generalized Cross-Validation
Akaike information Criterion
Bayesian InformationCriterion
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Model Selection Criteria
●
●
●
●●●
●
●●
●
●
●●
●
●●
●
●●
●
●
●●
●
●●
●●●●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●
●●
●
●
●●
●
●●
●
●
●
●●
●
●●●●●●●●
●●
●●●●●●
●
●
●
●●●
●
●
●
●
●
●●●
●
0.0 0.2 0.4 0.6 0.8 1.0
1.0
1.5
2.0
2.5
3.0
x values
y va
lues
Smoothing parameters:
lambda = 0.037 821
Cross-Validation
Generalized Cross-Validation
Akaike information Criterion
Bayesian InformationCriterion
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Model Selection Criteria
●
●
●
●●●
●
●●
●
●
●●
●
●●
●
●●
●
●
●●
●
●●
●●●●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●
●●
●
●
●●
●
●●
●
●
●
●●
●
●●●●●●●●
●●
●●●●●●
●
●
●
●●●
●
●
●
●
●
●●●
●
0.0 0.2 0.4 0.6 0.8 1.0
1.0
1.5
2.0
2.5
3.0
x values
y va
lues
Smoothing parameters:
lambda = 0.199 624
Cross-Validation
Generalized Cross-Validation
Akaike information Criterion
Bayesian InformationCriterion
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Model Selection Criteria
●
●
●
●●●
●
●●
●
●
●●
●
●●
●
●●
●
●
●●
●
●●
●●●●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●
●●
●
●
●●
●
●●
●
●
●
●●
●
●●●●●●●●
●●
●●●●●●
●
●
●
●●●
●
●
●
●
●
●●●
●
0.0 0.2 0.4 0.6 0.8 1.0
1.0
1.5
2.0
2.5
3.0
x values
y va
lues
Smoothing parameters:
lambda = 1.053 625
Cross-Validation
Generalized Cross-Validation
Akaike information Criterion
Bayesian InformationCriterion
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Well-known algorithmsReal data - Example
Contents
1 IntroductionMotivationCurse of DimensionalityAdditive Decomposition
2 Additive RegressionSpline EstimatesKernel Estimates
3 Generalized Additive Regression
4 Rate of Convergence
5 Model Selection Criteria
6 Adaptive MethodsWell-known algorithmsReal data - Example
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Well-known algorithmsReal data - Example
Iterative methods - Backfittilng Algorithm
The first proposals → iterative methods(based on the additive decomposition method)
fj(xj) = E[Y − µ−
∑Jt=1, t 6=j ft(xt)|Xj
]
1 Initialization: µ0 = 1N
∑Ni=1 Yi , fj = f 0
j , j = 1, . . . , J
2 fj = Sj
[Y − µ0 −
∑k 6=j fk (Xk )|Xj
]µ0 = µ0 + 1
N
∑Ni=1 fj(Xij)
fj = fj − 1N
∑Ni=1 fj(Xij)
3 Repeat step 2 until sufficient convergence...
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Well-known algorithmsReal data - Example
Iterative methods - Backfittilng Algorithm
The first proposals → iterative methods(based on the additive decomposition method)
fj(xj) = E[Y − µ−
∑Jt=1, t 6=j ft(xt)|Xj
]
1 Initialization: µ0 = 1N
∑Ni=1 Yi , fj = f 0
j , j = 1, . . . , J
2 fj = Sj
[Y − µ0 −
∑k 6=j fk (Xk )|Xj
]µ0 = µ0 + 1
N
∑Ni=1 fj(Xij)
fj = fj − 1N
∑Ni=1 fj(Xij)
3 Repeat step 2 until sufficient convergence...
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Well-known algorithmsReal data - Example
Iterative methods - Backfittilng Algorithm
The first proposals → iterative methods(based on the additive decomposition method)
fj(xj) = E[Y − µ−
∑Jt=1, t 6=j ft(xt)|Xj
]
1 Initialization: µ0 = 1N
∑Ni=1 Yi , fj = f 0
j , j = 1, . . . , J
2 fj = Sj
[Y − µ0 −
∑k 6=j fk (Xk )|Xj
]µ0 = µ0 + 1
N
∑Ni=1 fj(Xij)
fj = fj − 1N
∑Ni=1 fj(Xij)
3 Repeat step 2 until sufficient convergence...
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Well-known algorithmsReal data - Example
Iterative methods - Backfittilng Algorithm
The first proposals → iterative methods(based on the additive decomposition method)
fj(xj) = E[Y − µ−
∑Jt=1, t 6=j ft(xt)|Xj
]
1 Initialization: µ0 = 1N
∑Ni=1 Yi , fj = f 0
j , j = 1, . . . , J
2 fj = Sj
[Y − µ0 −
∑k 6=j fk (Xk )|Xj
]µ0 = µ0 + 1
N
∑Ni=1 fj(Xij)
fj = fj − 1N
∑Ni=1 fj(Xij)
3 Repeat step 2 until sufficient convergence...
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Well-known algorithmsReal data - Example
Iterative methods - Backfittilng Algorithm
The first proposals → iterative methods(based on the additive decomposition method)
fj(xj) = E[Y − µ−
∑Jt=1, t 6=j ft(xt)|Xj
]
1 Initialization: µ0 = 1N
∑Ni=1 Yi , fj = f 0
j , j = 1, . . . , J
2 fj = Sj
[Y − µ0 −
∑k 6=j fk (Xk )|Xj
]µ0 = µ0 + 1
N
∑Ni=1 fj(Xij)
fj = fj − 1N
∑Ni=1 fj(Xij)
3 Repeat step 2 until sufficient convergence...
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Well-known algorithmsReal data - Example
Iterative methods - Backfittilng Algorithm
The first proposals → iterative methods(based on the additive decomposition method)
fj(xj) = E[Y − µ−
∑Jt=1, t 6=j ft(xt)|Xj
]
1 Initialization: µ0 = 1N
∑Ni=1 Yi , fj = f 0
j , j = 1, . . . , J
2 fj = Sj
[Y − µ0 −
∑k 6=j fk (Xk )|Xj
]µ0 = µ0 + 1
N
∑Ni=1 fj(Xij)
fj = fj − 1N
∑Ni=1 fj(Xij)
3 Repeat step 2 until sufficient convergence...
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Well-known algorithmsReal data - Example
Iterative techniques - Computer intensive methods
Recursive Partitioning Regression - RPR- spline estimate of zero degree- locally constant estimate with a great interpretability
MARS Algorithm - MARS- Multivariate adaptive spline estimates- modification of the RPR algorithm (continuity condition)
Projection Pursuit Regression - PPR- projection into the lower dimensions- additivity in a different sense
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Well-known algorithmsReal data - Example
Contents
1 IntroductionMotivationCurse of DimensionalityAdditive Decomposition
2 Additive RegressionSpline EstimatesKernel Estimates
3 Generalized Additive Regression
4 Rate of Convergence
5 Model Selection Criteria
6 Adaptive MethodsWell-known algorithmsReal data - Example
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Well-known algorithmsReal data - Example
Example: Polynomial Regression
Polynomial Regression Estimate- spline estimate of 3th degree.Life exp. v S12[Log(People/TV ), Log(People/physician)]
Log(people per physician)Log(people per TV)
Average Life E
xp.
Residual Sum of Squares:
[1] 10.56021
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Well-known algorithmsReal data - Example
Example: Additive Regression Model
Additive Regression Estimate - generalization of PPR- additive spline estimate of the 3rd degreeLife expectancy v S1[Log(People/TV )] + S2[Log(People/phys)]
Log(people per physician)Log(people per TV)
Average Life E
xp.
Residual Sum of Squares:
[1] 11.89261
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Well-known algorithmsReal data - Example
Example: Recursive Partitioning Regression
Recursive Partitioning Regression Estimate- locally constant estimate - spline of the 0rd degreeLife expectancy v
∑Vv=1 πv
∏{x∈Bv}
Log(people per physician)Log(people per TV)
Average Life E
xp.
Residual Sum of Squares:
[1] NaN
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Well-known algorithmsReal data - Example
Example: MARS Algorithm
Multivariate Adaptive Regression Splines - MARS- modification of the RPR algorithmLife expectancy v s0 +
∑Vv=1 svBv (x)
Log(people per physician)Log(people per TV)
Average Life E
xp.
Residual Sum of Squares:
[1] 11.32777
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Well-known algorithmsReal data - Example
Example: Projection Pursuit Regression
Projection Pursuit Regression - PPR- projection into the lower dimensionsLife expectancy v
∑Vv=1 gv (bT
v x)
Log(people per physician)Log(people per TV)
Average Life E
xp.
Residual Sum of Squares:
[1] 6.129001
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models
IntroductionAdditive Regression
Generalized Additive RegressionRate of Convergence
Model Selection CriteriaAdaptive Methods
Well-known algorithmsReal data - Example
Additive Regression Models with Regression Splines
Thank you for your attention...
Matúš Maciak: [email protected]
Matúš Maciak MFF UK - [email protected] Semiparametric and Nonparametric Additive Regression Models