SYLLABUS REVIEW Science Series RESEARCH ARTICLE - STATISTICS On correcting the effects of model selection on inference in linear regression Georges Nguefack-Tsague* 1 , Walter Zucchini** and Siméon Fotso*** *Department of Public Health, Faculty of Medicine and Biomedical Sciences, University of Yaounde I, P.O. Box 1364, Yapundé, Cameroon. ** Institute for Statistics and Econometrics, Georg-August-Universität, Platz der Göttinger Sieben 5, 37073 Göttingen, Germany. ***Department of Mathematics, Higher Teachers´Training College, University of YaoundeI, P.O. Box 47 Yaoundé, Cameroon 1 Corresponding author, [email protected], Phone: +237 776-736-65 Received: 15 February 2011 / Revised: 18 August 2011 / Accepted: 15 September 2011 Ecole Normale Supérieure, Université de Yaoundé I, Cameroun Abstract This paper deals with the use of the same data to select a model and to carry out inference, in particular point estimation and point prediction. The resulting estimator is called the post-model selection estimator whose properties are hard to derive. Using selection criteria such as hypothesis testing, AIC, BIC and Cp, we illustrate that, in terms of risk function, no post-model selection estimator uniformly dominates the others, even for consistent criteria. We stress the facts that in this framework, classical model averaging and model selection have different philosophies. Since post-model selection estimators can be regarded as 0-1 random-weights model averaging, we propose a connection between the two theories in the frequentist approach. We illustrate the point by simulating a simple linear regression model. Key-Words: Model averaging, model selection, inference, post-model selection estimator Résumé Cet article porte sur lútilisation des mêmes données pour sélectionner un modèle et de procéder à línférence, en particulier léstimation ponctuelle et la prévision ponctuelle. Léstimateur résultant est appelé estimateur après sélection du modèle (EASM) dont les propriétés sont difficiles à établir. En utilisant des critères de sélection tels que des tests d´hypothèses, AIC, BIC et Cp, nous illustrons quén termes de fonction de risque, aucun EASM ne domine uniformément les autres, même pour des critères consistants. Nous insistons particulièrement sur le fait que, dans ce cadre, la méthode de mélange de modèles et celle de sélection de modèles ont des philosophies différentes. Puisque les EASMs peuvent être considérés comme mélange de modèles avec un poids aléatoire 0-1, nous proposons une connexion entre les deux théories dans lápproche fréquentiste. Nous illustrons notre technique en simulant un modèle de régression linéaire simple. Mots clés: mélange de modèles, sélection de modèles, inférence, estimateur après sélection du modèle E N S Syllabus Review 2 (3), 2011: 122-140

On correcting the effects of model selection on … · SYLLABUS REVIEW Science Series RESEARCH ARTICLE - STATISTICS On correcting the effects of model selection on inference in linear

Download PDF Report

Upload
duongdieu
View
213
Download
0

Embed Size (px)

Citation preview

SYLLABUS

REVIEW Science Series

RESEARCH ARTICLE - STATISTICS

On correcting the effects of model selection on inference in linear regression

Georges Nguefack-Tsague*

1, Walter Zucchini** and Siméon Fotso***

*Department of Public Health, Faculty of Medicine and Biomedical Sciences, University of Yaounde I, P.O. Box 1364, Yapundé, Cameroon. ** Institute for Statistics and Econometrics, Georg-August-Universität, Platz der Göttinger Sieben 5, 37073 Göttingen, Germany.

***Department of Mathematics, Higher Teachers´Training College, University of YaoundeI, P.O. Box 47 Yaoundé, Cameroon 1Corresponding author, [email protected], Phone: +237 776-736-65

Received: 15 February 2011 / Revised: 18 August 2011 / Accepted: 15 September 2011

Ecole Normale Supérieure, Université de Yaoundé I, Cameroun

Abstract This paper deals with the use of the same data to select a model and to carry out inference, in particular point estimation and point prediction. The resulting estimator is called the post-model selection estimator whose properties are hard to derive. Using selection criteria such as hypothesis testing, AIC, BIC and Cp, we illustrate that, in terms of risk function, no post-model selection estimator uniformly dominates the others, even for consistent criteria. We stress the facts that in this framework, classical model averaging and model selection have different philosophies. Since post-model selection estimators can be regarded as 0-1 random-weights model averaging, we propose a connection between the two theories in the frequentist approach. We illustrate the point by simulating a simple linear regression model. Key-Words: Model averaging, model selection, inference, post-model selection estimator

Résumé

Cet article porte sur lútilisation des mêmes données pour sélectionner un modèle et de procéder à línférence, en particulier léstimation ponctuelle et la prévision ponctuelle. Léstimateur résultant est appelé estimateur après sélection du modèle (EASM) dont les propriétés sont difficiles à établir. En utilisant des critères de sélection tels que des tests d´hypothèses, AIC, BIC et Cp, nous illustrons quén termes de fonction de risque, aucun EASM ne domine uniformément les autres, même pour des critères consistants. Nous insistons particulièrement sur le fait que, dans ce cadre, la méthode de mélange de modèles et celle de sélection de modèles ont des philosophies différentes. Puisque les EASMs peuvent être considérés comme mélange de modèles avec un poids aléatoire 0-1, nous proposons une connexion entre les deux théories dans lápproche fréquentiste. Nous illustrons

notre technique en simulant un modèle de régression linéaire simple. Mots clés : mélange de modèles, sélection de modèles, inférence, estimateur après sélection du modèle

Syllabus Review 2 (3), 2011: 122-140

mailto:[email protected]